TrustRadius Insights for Apache Airflow are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Business Problems Solved
Apache Airflow has proven to be a versatile solution for managing and orchestrating various data tasks. Users have utilized this product as a core component for scheduling and monitoring scheduled jobs, inspecting job successes and failures, and troubleshooting errors or failures. It has also been extensively employed in GCP as part of Cloud Composer for running ETL jobs, streamlining data pipelines, and creating workflows for analytics and reporting.
Reviewers have found Apache Airflow to be an easy-to-configure and setup solution, making it ideal for orchestrating data flows and building enterprise data pipelines. Its ability to integrate with third-party solutions via APIs allows for seamless data access and integration. Users have also appreciated the product's capability to manage ETL pipelines and programmatically monitor data pipelines.
Another valuable use case of Apache Airflow is its role in creating workflows, orchestrating data pipelines, and automating tasks. Its flexibility has been particularly beneficial when dealing with complex data pipelines from diverse sources. Furthermore, the product has been effective in performing data integration in AWS S3 region, connecting to relational databases, executing data extracts, and compiling them into multiple flat file segments.
Apache Airflow brings standardization and modularity to data pipelines, enabling the implementation of complex pipelines and facilitating the sharing of data with partners as well as scoring machine learning models. Overall, users have found this product to be a valuable tool for managing data tasks efficiently and effectively.
Loading Reviews List....
Apache Airflow Reviews
4 Reviews
Professional, Scientific, and Technical ServicesInformation Technology & Services4
I am part of the data platform team, where we are responsible for building the platform for data ingestion, an aggregation system, and the compute engines. Apache Airflow is one of the core systems responsible for orchestrating pipelines and scheduled workflows. We have multiple deployments of Apache Airflow running for different use cases, each with a workflow of 5,000 to 9,000 DAGs and executing even more DAGs. The Apache Airflow now also offers HA with scheduler replicas, which is a lifesaver and is well-maintained by the community.
Pros
Apache Airflow is one of the best Orchestration platforms and a go-to scheduler for teams building a data platform or pipelines.
Apache Airflow supports multiple operators, such as the Databricks, Spark, and Python operators. All of these provide us with functionality to implement any business logic.
Apache Airflow is highly scalable, and we can run a large number of DAGs with ease. It provided HA and replication for workers. Maintaining airflow deployments is very easy, even for smaller teams, and we also get lots of metrics for observability.
Cons
To achieve a production-ready deployment of Apache Airflow, you require some level of expertise. A repository of officially maintained sample configurations of Helm charts will be handy for a new team.
As airflow is used to build many data pipelines, a feature for building lineage using queries for different compute engines will help develop the data catalog. Typically, multiple tools are required for this use case.
For building a data pipeline from upstream to downstream tables, using Airflow with lineage to trigger the downstream DAGs after recovery will be helpful. Additionally, creating a dependency between the DAGs would be beneficial.
Likelihood to Recommend
Airflow is well-suited for data engineering pipelines, creating scheduled workflows, and working with various data sources. You can implement almost any kind of DAG for any use case using the different operators or enforce your operator using the Python operator with ease. The MLOps feature of Airflow can be enhanced to match MLFlow-like features, making Airflow the go-to solution for all workloads, from data science to data engineering.
We are using Apache Airflow as an orchestration tool in data engineering workflows in gaming product. We are scheduling multiple jobs i.e hourly / daily / weekly / monthly. We have a lot of requirement for dependent jobs i.e job1 should mandatory run before job2, and Apache Airflow does this work very swiftly, we are utilising multiple Apache Airflow integration with webhook and APIs. Additionally, we are doing a lot of jobs monitoring and SLA misses via Apache Airflow features
Pros
Job scheduling
Dependent job workflows
Failure handling and rerun of workflows
Cons
Better User Interface
Likelihood to Recommend
Dependent Job scheduling Rerun mechanism of workflows High availability deployment strategies
We use Apache Airflow to streamline the data pipelines, create workflows according to the needs of the project and overall monitoring of the functionality itself. In addition, we are using Apache Airflow to solve the problem of retrieving data from Hive before creating the workflow in its entirety. It's also utilized for automation.
Pros
In charge of the ETL processes.
As there is no incoming or outgoing data, we may handle the scheduling of tasks as code and avoid the requirement for monitoring.
Cons
There is no way to assess the processes because they do not keep the metadata.
Python is currently the only language supported for creating programmed pipelines.
They need to implement both event-based and time-based scheduling.
Likelihood to Recommend
I handle our pipeline scheduling and monitoring. I had minimal problems with Apache Airflow. It's well-suited for data engineers who are responsible for the creation of the data workflows. It is also best suited for the scheduling of the workflow; it allows us to execute Python scripts as well. Finally, Apache Airflow is best suited for the circumstances in which we need a scalable solution.
VU
Verified User
Engineer in Information Technology (Information Technology & Services company, 10,001+ employees)
Apache airflow is a great way to orchestrate workflows and build enterprise data pipelines. It is very easy to configure and setup and would be my go to solution for orchestrating data flows. We use Airflow to integrate our solution via APIs and allow third party solutions to access our solution and data held within in it.
Pros
Orchestrate workflows
Visualise workflows easily using DAG
Integrate 3rd party data sources
Cons
Visualisation UI could be improved in my opinion.
Enterprise features
Performance improvements in bigger deployments.
Likelihood to Recommend
Well suited for anyone that wants to orchestrate data pipelines and workflows. Good for developing, scheduling, and monitoring data workflows and is capable of managing complex enterprise workloads and pipelines. The visual aspect of understanding how your workflows are inter-connected is especially useful.