Hi, we're using a custom docker image for setting ...
# ask-metaflow
f
Hi, we're using a custom docker image for setting up our environment. Model runs for one epoch, but at the end for validation when it tries to load images. We get "No Space Left on Device Error". We have enough disk space and memory allocated for the experiment. Suprising it works fine on our on-prem compute facilities.
1
d
Is there anyway to integrate it as part of
python sample_flow.py step-functions create
? Or you have to wrap it in another script and look for the EC2 Launch Template after the step function is created for modification?
a
you could create a script that modifies the launch template when a failure is detected and resubmits the workload
👍 1