Hi, everyone!
We are evaluating Metaflow as a main power horse in our company, and, of course, as with many people before us, dependencies installation is the major blocker.
We use Google Cloud to execute workflows, so our main option is to go with Argo Workflows then, and we also have plenty of internal Python packages, which makes the Conda approach a bit cumbersome, as we don't have it now and hesitant about introducing it into our stack.
I tried to experiment with speeding up the dependencies installation via pip to bootstrap the execution environment as quick as possible, but it still takes up to a couple minutes; obviously, it is too much when we are talking about hundreds of parallel tasks.
I skimmed through the Metaflow sources to understand more about how it works and I got this crazy thought, which I'm trying to evaluate right now and I would appreciate any help with it.
The idea is to build container images on the fly with all dependencies baked into them, so they can be reused between workflow executions. It aligns nicely with our flow, as we rarely upgrade the dependencies, but execute many small tasks.
In the abstract, it can be done similarly to these
@conda
/
@conda_base
decorators Metaflow has, but it leaves two questions open so far:
1. How to determine that workflow is being executed locally. So far I think that for local execution we would prefer just to build a virtual environment locally for each workflow step and use them (but this idea is also being evaluated right now)
2. How can this container build process be injected into the Metaflow "scheduling" process?
The image itself can be provided by the
@kubernetes
decorator, which can be applied on the fly to the workflow step, but I'm not sure yet, at what point we should build this image first.
I would appreciate any thoughts on this idea, thanks in advance 🙂