Technology

Batch data processing is too slow for real-time AI: How open-source Apache Airflow 3.0 solves the challenge with event-driven data orchestration


Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more


Transferring data from various sources to the correct location of artificial intelligence is a difficult task. This is where data coincidence techniques such as Apache Airflow fit.

Today, the Apache flow community has become its largest update in years, with the appearance of version 3.0. The new version is the first main version update in four years. The air flow was active, though, steadily in the 2.X series, including updates 2.9 and 2.10 in 2024, which both focused heavily on artificial intelligence.

In recent years, data engineers have adopted Apache Air flow as a standard reality. Apache Airflow has established itself as a leading coordination platform in open workflow with more than 3000 contributors and wide -range accreditation via Fortune 500 companies. There are also multiple commercial services that depend on the basic system, including Astros Astro, Google Cloud Composer and Amazon Larkflows Manged for Apache Airflow (MWAA) and Microsoft Azure Data Factory that manages air flow, among other things.

While institutions are struggling to coordinate the progress of data work through varying systems, clouds and work burdens of artificial intelligence, institutions have increasing needs. Apache Airflow 3.0 addresses the critical institution’s needs with an architectural redesign that can improve how institutions publish and publish data applications.

“For me, Airflow 3 is a new start, it is a basis for much larger groups of capabilities,” said Vikram Koka, Apache Airflow PMC (Project Management Committee) and great strategy officials in astronomy, for Venturebeat in an exclusive interview. “This is almost a complete reshaping based on what companies told us that they need the next level of critical adoption of the task.”

The complexity of the Foundation’s data has changed the needs of the data coincidence

As companies are increasingly dependent on data -based decisions, the complexity of the data of the data explodes. Institutions are now running complex pipelines that extend to multiple cloud environments, various databases and increasingly developed work burdens.

Airflow 3.0 appears specially designed to meet the needs of these advanced institutions. Unlike previous versions, this version is separated from the homogeneous Stone Pack, as it offers a distributed customer model that provides flexibility and safety. This new architecture allows institutions:

  1. Implementing tasks through multiple cloud environments.
  2. Implementing granular security controls.
  3. Supporting various programming languages.
  4. Enabling real multi -missile publishing operations.

Support for an interesting Airflow 3.0 language. While previous versions were primarily centered around Bethon, the new version supports multiple programming languages.

Airflow 3.0 is set to support Python and go with the planned support for Java, Typescript and RUST. This approach means that data engineers can write tasks in their favorite programming language, which reduces friction in developing and integrating workflow.

The capabilities of the event that the event is transferred

Airflow is traditionally distinguished in processing scheduled payments, but institutions increasingly need data processing potential in actual time. Air flow 3.0 now supports that need.

“The main change in air flow 3 is what we call a schedule dependent on the event,” Coca explained.

Instead of running the task of processing the data every hour, the air flow now begins automatically the task when downloading a specific data file or when a specific message appears. This can include data loaded in Amazon S3 cloud storage or a flowing data message in Apache Kafka.

It deals with the possibility of scheduling that depends on the event [Extract, Transform and Load] Tools and flow frameworks such as Apache Flink or Apache Spark Streaming, allowing institutions to use one coordination layer for both scheduled and scheduled workflow.

Air flow will accelerate the implementation of the institution’s inference to the Foundation and the artificial intelligence compound

The event -based data coincidence will also help the air flow to support the implementation of rapid reasoning.

For example, Koka detailed the case of use where the actual time reason for professional services is used such as legal time tracking. In this scenario, air flow can be used to help collect raw data from sources such as calendars, email messages and documents. A large language model (LLM) can be used to convert informal information into organized data. Another pre -trained model can then be used to analyze structured time tracking data, determine whether the work is subject to bump, then set the appropriate bills and prices.

Koka referred to this approach as the AI ​​compound system – a workflow that combines various Amnesty International models to complete a complex task with efficiency and intelligence. The Airflow 3.0 structure, which depends on the event, makes this type of actual and multiple inference process possible steps through cases of the use of various institutions.

Corpound AI is an approach that was first defined by the Berkeley Intelligence Research Center in 2024 and is slightly different from AIC. Koka explained that Agency AI allows to make artificial intelligence decisions, while AI The boat has pre -predictable workflow tasks that can be predicted more and reliable for commercial use cases.

Play the ball with air flow, how Texas Rangers is looking to benefit

Among the many air flow users, the Texas Rangers Major League Baseball team.

Oliver Dexra, the full data engineer at Texas Rennegers for the baseball, told Venturebeat that the team uses the air flow hosted on the Astro Astro platform to the astronomer as the “Neurology Center” for baseball data operations. He pointed out that all the development of players, contracts, analyzes, and of course, the game data is organized through the air flow.

“We are looking to upgrade to Airflow 3 and its improvements to the event that depends on the event, the ability to observe and the data rates,” said Dykstra. “Since we are already relying on air flow to manage artificial intelligence pipelines/ML, additional efficiency and reliability of air flow 3 will help increase confidence and flexibility in these data products within our entire organization.”

What does this mean to rely on AI

For technical decision makers who evaluate the data formatting strategy, Airflow 3.0 provides implemented advantages that can be implemented in stages.

The first step is to evaluate the current data workflow that will benefit from the new event -based capabilities. Institutions can define data pipelines that currently lead to scheduled functions, but juvenile -based operators can manage more efficiently. This shift can significantly reduce the time of the transmission of treatment while eliminating wasteful polling.

Next, technology leaders must evaluate their development environments to determine whether the new language support in air flow can unify the segmented synchronization tools. The teams that currently maintain separate formatting tools for different language environments can start planning the deportation strategy to simplify its technology pile.

For institutions that lead the way to implement artificial intelligence, Airflow 3.0 represents an essential component of infrastructure can face a major challenge in adopting artificial intelligence: complex AI’s workforce format, multi -stage over the institution. The platform’s ability to coordinate the compound artificial intelligence systems can help enable institutions to overcome the proof of the concept to spreading artificial intelligence at the institution level with appropriate governance, security and reliability.


Don’t miss more hot News like this! Click here to discover the latest in Technology news!


2025-04-22 20:00:00

Related Articles

Back to top button