I am running into the infamous `An error occurred TooManyReq Outerbounds #ask-metaflow

I am running into the infamous `An error occurred ...

brave-fall-84099

02/20/2025, 2:38 PM

I am running into the infamous

An error occurred (TooManyRequestsException) when calling the DescribeJobDefinitions operation (reached max retries: 4): Too Many Requests

when fanning out to ~100 steps. I see some open PRs about the subject. I am wondering about two things: • Are there config values (max retries, backoff?) that I can play with • Can I catch this issue happening in runner api? I would be happy if the whole run fails immediately (or if I can make it fail). The annoying thing that happens that 2 out of 100 jobs fail to even be registered, the whole job runs and crashes before join because some jobs were not even registered, wasting a lot of time.

brave-fall-84099

02/20/2025, 2:49 PM

Similar with other step level errors like running into a limit of requesting secrets via

@secrets

. How can I catch this / make note of it? Currently I can only figure out why 2 / 100 tasks are stillborn because I happen to see the error fly by in my console output.

ancient-application-36103

02/20/2025, 4:30 PM

TooManyRequestException

for

DescribeJobDefinitions

is unfortunately due to a global AWS limit - a better bet would be to get those limits raised by working with your AWS TAM.

ancient-application-36103

02/20/2025, 4:31 PM

can you help us with the error that you are running into with

@secrets

ancient-application-36103

02/20/2025, 4:34 PM

also - unfortunately never found time to implement this which will reduce the probability of running into this issue

7 Views

Open in Slack

Previous Next