Hello team. I am trying to optimise metaflow infra...
# ask-metaflow
q
Hello team. I am trying to optimise metaflow infrastructure for running tasks at very high volume (let’s say each step takes only <10s and there is a large fanout of steps). Let’s disregard whether this is a right thing to do from an application architecture wise, the bottleneck I am current facing is that at the start of step, metaflow seems to run these commands (for context I am on AWS SFN + Batch over EC2)
Copy code
mflog 'Setting up task environment.' && python -m pip install awscli requests boto3 -qqq && ...
If my EC2 machine is overloaded with more than 30-40 tasks this step starts taking around 1 minute due to congested network. As you can imagine I am reluctant to pay an overhead of 1 minute when my total task runtime is <10s. Do you guys have any ideas on how I can avoid this install? We package all our application-level dependencies via a docker container, I am willing to package these dependnecies into it as well if there’s a way to disable this statement.
1