Hi all, is anyone looking into a Dagster Metaflow ...
# dev-metaflow
c
Hi all, is anyone looking into a Dagster Metaflow plugin orchestrator? We are specifically interested in Dagster on K8s. Dagster is enticing for Data Engineering teams, while Metaflow is more alluring for AI teams. Hopefully we could share the same backend.
1
❤️ 1
a
Very interesting! I haven't looked deeply into Dagster (besides their docs). What would be the major selling points of Dagster for data engineers at Zillow?
c
Good question, I'm following up with our data engineers 🙂 I believe it's for the following: • Simple Pythonic SDK (as compared to Argo YAML) • Rich scheduling and backfills • Rich sensors • UI that has parity with Airflow ◦ (schedules and backfills for example) ◦ job control (launch / retry) • Can test on laptop / CICD • Lots of integrations (maturity) Basically, I view it as having feature parity with Airflow, along with benefits of easy testing and a simple Pythonic SDK.
1
a
We have an airflow integration for Metaflow (just waiting support for
foreaches
) now available. A Dagster integration would look quite similar (modulo details).
c
Are there decorators or features that are not supported in Airflow?
a
All features of Metaflow are supported in airflow (except for nested for-eaches - airflow has no such concept)
thanks ty 1
c
I haven't used Airflow in years, is the
foreach
equivalent to Airflow a
TaskGroup
? This shows nested
TaskGroup
support on managed Astronomer https://www.astronomer.io/guides/task-groups/
Ah it doesn't iterate on a dynamic list doh
a
Airflow 2.3.0 introduced limited support for dynamic task mapping - which is what Metaflow's for-each piggy backs on. Airflow is limited on the cardinality of maps at the moment. @hallowed-glass-14538 has more details on this.
c
@hallowed-glass-14538 does dynamic task mapping not support nested foreach using
expand()
?
h
Airflow doesn't have nested for each support with task mapping
Only map reduce style operations
They had nested for each support in their roadmap though
Just don't know when it will be released
They are trying to modify task_groups to support nested foreach's. You can check it here in AIP-42