Hey Outerbounds, This might be a bit more AWS spe...
# dev-metaflow
c
Hey Outerbounds, This might be a bit more AWS specific, but out of nowhere we are getting errors on our batch queue that handles all our metaflow jobs, where they aren't able to transition to
start.
The container is getting a
DockerTimeoutError
and we are just using normal images ( whatever metaflow provisions ). Has this happened to anyone before and are there any steps we can take to fix it?
1
w
Happened to me before, was due to throttling on DockerHub. There's a thread in #C02116BBNTU
a
if that's the culprit, then it may help to use dockerhub mirror on AWS ECR, by setting
DEFAULT_CONTAINER_REGISTRY
to
public.ecr.aws/docker/library/
c
How do I set that @average-beach-28850? In the flow or the cfn somewhere?
Is this the recommended way to tackle the issue?
a
you can either set the environment variable
METAFLOW_DEFAULT_CONTAINER_REGISTRY
(usually that's what i do to test it) or set it in metaflow config at
~/.metaflowconfig/config.json
(it would be
DEFAULT_CONTAINER_REGISTRY
there)
c
Thanks, I will give it a try and report back!
👍 1
c
@careful-dress-39510 hey I am facing this exact issue as well. Did this fix it?
@brief-kite-90012