Had an unexpected conversation at work today. An e...
# ask-metaflow
l
Had an unexpected conversation at work today. An exec does NOT want to use Metaflow, and instead wants to use Airflow both for DE and DS. When I asked how we'd make it easy for DS to kick off compute jobs with GPUs and track dependencies and make it easy to install them, he said we'll have our Airflow DAGs kick off compute jobs using a tool out of Netflix called Genie. (really he said we'd use a faster implementation of it, but he gave this as a reference) What does this group make of this? Metaflow (especially Outerbounds' distro) is so well thought out. I'm new to this idea and I'm not sure how to respond.
1
d
well being at Netflix, we have Genie too but I fail to see how it’s related to managing dependencies and what not.
genie is great for what it does
it’s not a replacement of Metaflow (nor the other way around btw)
i’d be curious as to why he doesn’t want to use metaflow. Can help us improve.
l
The architecture he described would use a proxy like genie for all DE and DS compute. So, you'd send it a query and it'd • decide to run it on trino • or on spark • or on clickhouse • or other compute enginers • or trigger a GPU-based job e.g. training a model So something like Airflow would use an operator to reach out to genie saying "here's my zipped up python code, the deps it needs, etc. please run it"
I have little understanding of genie. But that last bullet feels weird to me. But his reasoning is: if we have a "orchestrator API" that we can submit even training jobs to, then we can add or swap out the compute piece whenever we want. So one day we could use AWS Sagemaker to do that compute, and another we could move it on prem or use some other tool that can pick up and run that job.
It feels like a lot to build to me. And highly irregular (not industry standard by any means)
d
I can’t claim huge knowledge of genie but for us it’s definitely much more used to run data jobs, not general compute. I guess it could work for general compute but that’s definitely not how we use it here. But I still don’t think genie takes care of the “deps” part of the story. You still have to resolve that yourself.
h
I don't think Metaflow vs Airflow is even the right comparison -- you could deploy your flows using Metaflow onto Airflow. Also regarding swapping out the compute piece, Metaflow does that via decorators (eg. you could have a
@sagemaker_training
decorator).
this 1