Hi all I am also running into <https 23andme slack com archi Outerbounds #ask-metaflow

Hi all, I am also running into <this issue>, too m...

kind-horse-40048

01/17/2025, 12:17 PM

Hi all, I am also running into this issue, too many Batch job submission requests (

TooManyRequestsException

), when attempting a Distributed Map-mode SFN fanout with 10,000+ tasks and 5000 workers. If this is just unfortunately expected at this scale, I can scale back. I'd also be happy to create an issue. I'm using

metaflow==2.11.10

. What would you advise?

ancient-application-36103

01/17/2025, 12:44 PM

AWS Batch has a global limit on the rate at which jobs can be submitted to it across all workloads (whether metaflow or non-metaflow). One way to ensure better traffic shaping is to use the —max-workers argument so that Metaflow sends fewer jobs to AWS Batch - this may work for you - but is not entirely guaranteed given the global nature of the limitation AWS imposes here. Maybe your AWS TAM is able to lift this limitation - this was one of the reason for us to offer support for Kubernetes where we can work around these limitations - Kubernetes has its own set of issues at scale but it at least provides us affordance to address it directly.

🙏 1

kind-horse-40048

01/17/2025, 5:35 PM

Hi Savin, thanks for the response. Decreasing

--max-workers

from 5000 to 2000 seems to have avoided this limit in my case. For reference, I was in all likelihood the only source of Batch job submissions at the account level at the time.

among us party 1

3 Views

Open in Slack

Previous Next