Hi all, I am also running into <this issue>, too m...
# ask-metaflow
k
Hi all, I am also running into this issue, too many Batch job submission requests (
TooManyRequestsException
), when attempting a Distributed Map-mode SFN fanout with 10,000+ tasks and 5000 workers. If this is just unfortunately expected at this scale, I can scale back. I'd also be happy to create an issue. I'm using
metaflow==2.11.10
. What would you advise?
a
AWS Batch has a global limit on the rate at which jobs can be submitted to it across all workloads (whether metaflow or non-metaflow). One way to ensure better traffic shaping is to use the —max-workers argument so that Metaflow sends fewer jobs to AWS Batch - this may work for you - but is not entirely guaranteed given the global nature of the limitation AWS imposes here. Maybe your AWS TAM is able to lift this limitation - this was one of the reason for us to offer support for Kubernetes where we can work around these limitations - Kubernetes has its own set of issues at scale but it at least provides us affordance to address it directly.
🙏 1
k
Hi Savin, thanks for the response. Decreasing
--max-workers
from 5000 to 2000 seems to have avoided this limit in my case. For reference, I was in all likelihood the only source of Batch job submissions at the account level at the time.
among us party 1