Hello! I have an issue when running Metaflow on AW...
# ask-metaflow
m
Hello! I have an issue when running Metaflow on AWS Batch. I use the pypi decorator for generating an isolated environment with the packages necessary to run each step. When running locally, the environment is bootstrapped and everything works correctly. It used to work on AWS Batch as well but now (we might have cleared the environment cache on S3), the task is starting but remains blocked in the "RUNNING" state. When looking at the AWS Batch logs, it first takes around 10min to download the code package (which seems long as it used to be much faster). After that it takes around 1h to bootstrap the environment, which fails in the end due to a failure to install Micromamba throwing this error:
Copy code
2024-11-24T15:59:35.701+01:00 Setting up task environment.
2024-11-24T15:59:40.498+01:00 Downloading code package...
2024-11-24T16:07:41.254+01:00 Code package downloaded.
2024-11-24T16:07:41.287+01:00 Task is starting.
2024-11-24T16:07:41.631+01:00 Bootstrapping virtual environment...
2024-11-24T17:16:36.901+01:00 Bootstrap failed while executing: set -e;
2024-11-24T17:16:36.902+01:00 if ! command -v micromamba >/dev/null 2>&1; then
2024-11-24T17:16:36.902+01:00 mkdir -p micromamba;
2024-11-24T17:16:36.902+01:00 python -c "import requests, bz2, sys; data = requests.get('<https://micro.mamba.pm/api/micromamba/linux-64/1.5.7').content>; sys.stdout.buffer.write(bz2.decompress(data))" | tar -xv -C $(pwd)/micromamba bin/micromamba --strip-components 1;
2024-11-24T17:16:36.902+01:00 export PATH=$PATH:$(pwd)/micromamba;
2024-11-24T17:16:36.902+01:00 if ! command -v micromamba >/dev/null 2>&1; then
2024-11-24T17:16:36.902+01:00 echo "Failed to install Micromamba!";
2024-11-24T17:16:36.902+01:00 exit 1;
2024-11-24T17:16:36.902+01:00 fi;
2024-11-24T17:16:36.902+01:00 Stdout:
2024-11-24T17:16:36.902+01:00 Stderr: Traceback (most recent call last):
2024-11-24T17:16:36.902+01:00 File "<string>", line 1, in <module>
2024-11-24T17:16:36.902+01:00 File "/usr/local/lib/python3.11/bz2.py", line 333, in decompress
2024-11-24T17:16:36.902+01:00 res = decomp.decompress(data)
2024-11-24T17:16:36.902+01:00 ^^^^^^^^^^^^^^^^^^^^^^^
2024-11-24T17:16:36.902+01:00 OSError: Invalid data stream
2024-11-24T17:16:36.902+01:00 tar: This does not look like a tar archive
2024-11-24T17:16:36.902+01:00 tar: bin/micromamba: Not found in archive
2024-11-24T17:16:36.902+01:00 tar: Exiting with failure status due to previous errors
I use the default image created by Metaflow on AWS Batch (with Python version 3.11). Any ideas on what is causing this? Thanks in advance.
1
I was able to partially solve the issue by creating a custom Docker image with Micromamba pre-installed. Now it does not fail anymore. However, the bootstrapping still takes around 1h which is way too much, and much longer than what it used to take. Any idea why?
Solved the problem. It was unrelated to Metaflow. Our VPC configuration prevented outbound IPv4 connections (we use a NAT64 service since AWS is now charging for IPv4 addresses but many web services still do not support IPv6). Not sure why S3 seemed so impacted and so slow, it should have dual-stack... Anyway, sorry for bothering you guys.
🙌 1