handsome-ocean-79057
01/21/2025, 6:25 PM{{attempt}}
to be 0
b. {{attempt}}
is passed into command as MF_ATTEMPT=0
c. metaflow overwrites files in s3 for attempt 0
instead of creating a new attempt 4
5. next task in DAG fails because it cannot fetch the artifacts from latest attempt, and instead fetches artifacts from failed attempt=3
Is this something you are aware of already? Can we figure out an option to be able to do manual retries with argo?ancient-application-36103
01/21/2025, 6:31 PMancient-application-36103
01/21/2025, 6:31 PMhandsome-ocean-79057
01/21/2025, 6:33 PM@retry
in some cases since all attempts go to 0. Will take a look what options we have here thenhandsome-ocean-79057
02/06/2025, 4:46 PM--retry-count
argument. Would you have any concerns with that approach?
Follow-up question: It seems like metaflow iterates over s3 files to find the latest attempt number. Couldn't we also fetch this from the metaflow backend metadata? Is there any preference if s3 or metaflow api should be the source of truth?ancient-application-36103
02/06/2025, 4:58 PMancient-application-36103
02/06/2025, 4:58 PMhandsome-ocean-79057
02/06/2025, 5:57 PMancient-application-36103
02/06/2025, 6:24 PMancient-application-36103
02/06/2025, 6:25 PMhandsome-ocean-79057
02/07/2025, 8:30 AMancient-application-36103
02/10/2025, 9:17 PMancient-guitar-13766
02/18/2025, 11:45 AMretry_count
from the step command-line interface.
We tested the solution for an error in a linear flow and plan to spend more time covering additional use cases.ancient-guitar-13766
02/18/2025, 11:47 AMcurrent.retry_count
, with our solution applied:
For an Argo-retried workflow, the retry_count
appears to update correctly. In this case, the flow was retried by Argo after three failed attempts.ancient-application-36103
02/20/2025, 5:46 PMretry
till the current max-attempts
are reached - wdyt?handsome-ocean-79057
02/21/2025, 8:09 AM