Is there an intuitive way to run a step if any of ...
# ask-metaflow
h
Is there an intuitive way to run a step if any of the previous steps fail? I need to run a final step to tear down infrastructure at the end of the run in case of failure. It looks like the
@catch
operator is going to need to be added to all previous steps, but there are 18 previous steps in my DAG, so this adds a lot of boilerplate. Is there a simpler way to just say "run this tear-down step regardless of the status of all previous steps"?
1
h
You can add a catch to all your steps using —`-with catch`
Another way I can think of would be to trigger your flow with another one that can then do your cleanup
a
also curious what is the nature of cleanup? you can add a pure python decorator that wraps the user code in try ... except and does the clean up if except is triggered.
h
> also curious what is the nature of cleanup? you can add a pure python decorator that wraps the user code in try ... except and does the clean up if except is triggered. The use case is standing up a PVC before a run begins, mounting it in all of the nodes in the DAG to share some data, and then destroying the PVC at the end of the run. Currently, if the run fails, the PVC will remain stood up because the final node responsible for deleting the PVC isn't reached
This pipeline is automated and runs in Argo Workflows.
Another way I can think of would be to trigger your flow with another one that can then do your cleanup
This approach definitely seems like a good one! Might try this out. Thanks!
h
Given your description, I feel like creating a step decorator to do the setup and teardown would be the cleaner way
h
Yeah, that would definitely feel like a much better approach. Are there concrete examples of how to do this, or should I just follow the approach taken here?
h
You probably want
task_pre_step
and
task_finished
. Caveat: there isn't fine-grained control over decorator ordering at the moment, so if you need your PVC available from within another decorator then things can get tricky due to the order of the setup/teardown https://netflix.slack.com/archives/C02116BBNTU/p1730834209671669?thread_ts=1730831644.349479&cid=C02116BBNTU
h
Ah yeah, so then this wouldn't be a viable approach. My pods will only launch via the
@kubernetes
decorator if the PVC already exists. Otherwise, they'll hang in a
pending
state forever. Ordering is super important here
This strikes me as the kind of thing that I would want a flow-level decorator for. The PVC is created in the first step, and all subsequent nodes mount the same PVC, before finally tearing it down once processing is completed.
h
So in your particular case you need it to run before
@kubernetes
and teardown after? You could just have the decorator wrap
@kubernetes
then?
h
I have a sequence of
@kubernetes
decorators in my flow. It needs to be created before the sequence starts, and deleted after the sequence ends.
One PVC serves every
@kubernetes
decorator in my flow
h
Oh I see. Then yeah this would be a good application for
flow_finalize
(which doesn't exist yet). Maybe @ancient-application-36103 has other ideas
h
Thanks for taking a look! It actually does sound analogous to the approach taken here, so I might just take a stab at replicating that
h
Where do you setup the PVC right now? Also just read the link you sent and it does seem like what you want
h
There's an initial step in the flow where a PVC is created, and an end step in the flow where that PVC is deleted
h
Although if you did catch and teardown the PVC on exception, won't it continue to the next step which expects the PVC to exist?
h
If any exception occurs, the intended behaviour would be to terminate the entire flow and delete the PVC
👍 1
a
@handsome-postman-16645 curious - why use a PVC and not use say S3 directly?
h
@square-wire-39606 There are nodes in my flow that run processing over large files by calling out to binaries (without Python bindings) on the OS via
subprocess
. These binaries take input files and produce output files, which are intermediate outputs in my flow. In this case, it doesn't really make sense to load each large intermediate file into memory in Python just to bring it into GCS/S3 via Metaflow
For this specific case, it's just generally more convenient to have all files on a mounted storage device shared across tasks, rather than pass a bunch of references to input and output URIs in each task
s
I see. It will likely be faster and cheaper to do it via GCS/S3, but if you want to tear down PVC in case of any error - wrapping a simple try except in a pure python decorator should suffice.
h
Thanks! Definitely going to try out the pure Python decorator approach