Hi team I wanted to know if there s a way to impose some con Outerbounds #ask-metaflow

Hi team, I wanted to know if there’s a way to impo...

quick-lighter-52296

10/03/2024, 10:16 AM

Hi team, I wanted to know if there’s a way to impose some conditions on the retry behaviour of metaflow. We use

@retry

decorator to configure retries for many of our steps to prevent against infrastructural failures, OOM issues and certain retryable application-level errors. But, sometimes the application error is known to be non-retryable and in such cases we end up needlessly wasting compute on retrying the step multiple times. Is there a way to annotate an exception somehow with a non-retryable attribute so that metaflow does not retry it? I know AWS Batch supports conditional retries in its retry strategy where you could configure the retry to take place only for certain exit codes which could theoretically be used to build such a feature but I am not sure if there are any plans to support it.

✅ 1

dry-beach-38304

10/03/2024, 3:43 PM

This is not supported at this time no — one thing you could do though, especially if you have an application error, is catch that normally within your flow (using a try/except or a regular decorator around your step) and not fail the task itself.

quick-lighter-52296

10/03/2024, 5:02 PM

Is this something that’s in the backlog or worth considering for you guys?

square-wire-39606

10/03/2024, 5:13 PM

if the application error is non-retryable, then you wouldn't want to retry the container against an infrastructural failure either?

quick-lighter-52296

10/04/2024, 8:15 AM

Yes but only if the container failed due to the application-level error. Concretely speaking, if we could have the container exit with a special exit code in case of user-determined non-retryable errors, we could theoretically instrument it to detect when the retry need NOT be made. Wdyt?

square-wire-39606

10/04/2024, 10:18 PM

theoretically, you might still miss out on scenarios where the retry should NOT be made if the container fails due to a platform issue at an inopportune moment.

5 Views

Open in Slack

Previous Next