We’re seeing this all the time - taking traditional workflow orchestration tools and instrumenting LLMs as part of it. It becomes a lot easier to build these because the complexity comes from a) the model, which frontier labs are making easy and b) productionizing a workflow, which workflow orchestration tools make easy. It’s also easy to recognize value because these workflows are often grounded in existing work and thus easy to measure.
We see these patterns do much so that we packaged it up for Airflow (one of the most popular workflow tools)!
Thanks for the feedback. I should probably make it more clear - there is no description field.
The block of text is the 'Likely to outsource' column. I use Perplexity Deep research to try to infer what services the company might need based on it's probable current challenges.
In this case it was: "Cloud infrastructure support to manage rapid scaling, pivoting, and new product integrations as part of business model transformation[3]., Specialized consulting (e.g., industry-specific financial regulations, fintech compliance, and go-to-market strategy) to facilitate entry into new verticals and optimize operational resilience[2]."
Airflow actually uses decorators to indicate something is an explicit task in a data pipeline vs just a utility function, so this follows that pattern!
It also uses an "operator" under the hood (Airflow's term for a pre-built, parameterized task) which can be subclassed and customized if you want to do any customization.
It is _potentially_ more restrictive than writing pure Python functions, but the plus side is that we can interject certain Airflow-specific features into how the agent runs. And this isn't mean for someone who knows agents inside & out / wants the low-level customizability.
The best example of this today is log groups: Airflow lets you log things out as part of a "group" which has some UI abstractions to make it easier. This SDK takes the raw agent tool calls and turns them each into a log group, so you can see a) at a high level what the agent is doing, and b) drill down into a specific tool call to understand what's happening within the tool call.
To your point about the `@task.llm_branch`, the SDK & Pydantic AI (which the SDK uses under the hood) will re-prompt the LLM up to a certain number of attempts if it receives output that isn't the name of a downstream task. So there shouldn't be much finickiness.
I appreciate the distinction between agents and workflows - this seems to be commonly overlooked and in my opinion helps ground people in reliability vs capability. Today (and in the near future) there's not going to be "one agent to rule them all", so these LLM workflows don't need to be incredibly capable. They just need to do what they're intended to do _reliably_ and nothing more.
I've started taking a very data engineering-centric approach to the problem where you treat an LLM as an API call as you would any other tool in a pipeline, and it's crazy (or maybe not so crazy) what LLM workflows are capable of doing, all with increased reliability. So much so that I've tried to package my thoughts / opinions up into an AI SDK for Apache Airflow [1] (one of the more popular orchestration tools that data engineers use). This feels like the right approach and in our customer base / community, it also maps perfectly to the organizations that have been most successful. The number of times I've seen companies stand up an AI team without really understanding _what problem they want to solve_...
We've built an SDK for building DAGs / data pipelines with LLMs in Apache Airflow [1] using Pydantic AI [2] under the hood. I've seen success across the board with Airflow users building simple LLM workflows before moving on to "AI agents". In my experience, the noise around building agents means that people forget that there are other ways to get more immediate value out of LLMs.
Coupling Airflow for orchestration and Pydantic AI for LLM interactions has turned out to be a very pragmatic approach to building these workflows (and agents). Neither tool "gets in the way" of what you're trying to do. Airflow's been around for 10+ years and has a very well-built orchestration engine rich with everything you need to write production grade data pipelines, and Pydantic AI's been a refreshing take on working with LLMs.
Astronomer | Software Engineer, Office of the CTO | NYC | astronomer.io
Astronomer is a Series C data infra startup building a data ops platform for our customers on top of Apache Airflow. Airflow is one of the largest open-source data projects, allowing users to write, run, and scale data pipelines in Python. It's downloaded over 30m times a month!
The core of this role is to experiment with new ideas and build prototypes. As a Software Engineer in the Office of the CTO, you will act as a “hacker in residence” - someone with the freedom to experiment, explore big ideas, and validate concepts. You’ll be a generalist, working with whatever technology, tools, and programming languages are required to validate an idea. For the ideas that do work out, we work hand-in-hand with the broader product and engineering teams to turn the idea into a customer-facing feature or new product.
There’s a particular focus on how Generative AI will change the lives of data professionals over the next 3-5 years. We want to lead the market with both a perspective and set of products on how AI will evolve from a human-driven copilot to a set of fully autonomous agents.
We see these patterns do much so that we packaged it up for Airflow (one of the most popular workflow tools)!
https://github.com/astronomer/airflow-ai-sdk