Have a perplexing problem... Intermittently in th...
# ask-metaflow
e
Have a perplexing problem... Intermittently in the environment bootstrapping I'm seeing errors about
Temporary failure in name resolution
full error log here My initial theory was that we were overloading a particular instance. The job we were running was set to have 2cpu and 2GB RAM and schedule ~20 jobs in its parallel step. However whether these jobs all get scheduled on the same node or multiple nodes, the same error pops up. It's also odd in that it doesn't fail at the same point every time. Occasionally it will fail when pip installing
requests
, other times when installing
awscli
etc. I tried to see if it was an issue with route53 DNS throttling. I followed the steps described here, but again didn't see an indication that this was the problem Any thoughts on where to go from here? Right now our jobs are having about a 10% failure rate
1