Hi Alex! Happy to answer! In our tests we have noticed that scaling Airflow for 1000s of DAGs (flows in Metaflow), thousands of workers and millions of concurrent tasks can become a non-trivial effort relatively quickly. Many Metaflow users quickly end up scaling to these limits - which we have been able to support well on AWS Step Functions and Argo Workflows. Also, another property of ML workflows is that many times you may have workflows that are either very wide (many concurrent tasks) or long running (order of weeks/months) or both - which is often not a characteristic of data engineering workloads which is the primary use case for Airflow. If you don't have high scale needs and your use cases fit within the envelope of what Airflow supports from an architectural standpoint - Airflow would be a good bet. Also, worth noting, we try to ensure that we are not integrating with any known fragile spots - for example, we don't rely on xcoms to pass data around which would have sorely limited the scaling ability.
Additionally, we have run into a few gotchas with Airflow that the Airflow community is actively addressing. For example, until recently, we couldn't support
foreaches in Airflow due to lack of dynamic tasks (introduced in Airflow 2.3.0). While we support it now, Airflow still lacks proper support for nested foreaches - which may or may not be a concern for you. Another feature is
event triggering - Airflow community is working on data triggering which is a related concept for which we have a much more battle tested implementation available on top of Argo Workflows.
At the end of the day, we try to ensure that the full feature set of Metaflow is available on these workflow orchestrators, but at times we are limited by certain limitations in their design resulting in a mismatch in parity. Also, we are always open to considering providing integrations to other workflow orchestrators (dagster, prefect , flyte etc.) as they gain popularity.