Thank you for sharing the Metaflow project with us. It is such an amazing tool!
We are experimenting with running Metaflow in an AKS cluster with Argo as the workflow engine. We had been managing project dependencies (internal and external) with Poetry, but for this prototype we have switched to using Conda because that is what we are seeing in the docs as the supported pattern. We now are seeing:
Setting up task environment.
Downloading code package...
Code package downloaded.
Task is starting.
Bootstrapping environment...
Environment bootstrapped.
With a simple flow of 4 steps that run instantly locally we are seeing runtime of about 3.5 minutes, just under a minute per step. We are thinking the majority of this time might be the bootstrapping of the environment where it is installing Conda packages but are not certain how to confirm this. If this is indeed the bottleneck, is there support for image caching for the container image that is doing the work so that it does not have to be built back up for each step?
Our use case is for parameterized calculation pipelines that will often foreach out to around 500 different steps and join back together and for different customers with different data sets this workflow could be run 1000 times in a day with different inputs. If the image bootstrapping was 30 seconds, this means we would be spending 1000*500*30 seconds on this. What is the best practice in reducing some of this bootstrapping overhead? Thanks again!