Note that Metaflow on Kubernetes with the default ...
# ask-metaflow
e
Note that Metaflow on Kubernetes with the default image is failing right now, most likely due to an upstream package that is installed in the docker image during runtime. The error looks like the following:
Copy code
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] info     libmamba ****************** Backtrace Start ******************
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] debug    libmamba Loading configuration
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] trace    libmamba Compute configurable 'create_base'
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] trace    libmamba Compute configurable 'no_env'
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] trace    libmamba Compute configurable 'no_rc'
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] trace    libmamba Compute configurable 'rc_files'
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] trace    libmamba Compute configurable 'root_prefix'
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] critical libmamba Could not use default root_prefix "/root/.local/share/mamba": Directory exists, is not empty and not a conda prefix.
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] info     libmamba ****************** Backtrace End ********************
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] error    libmamba Error opening for reading "/root/.local/share/mamba/pkgs/tk-8.6.13-noxft_h4845f30_101/info/index.json": No such file or directory
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] error    libmamba Error when extracting package: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv] critical libmamba Found incorrect downloads. Aborting
2024-07-16 13:39:10.017 [192694/start/2593865 (pid 4043)] Kubernetes error:
2024-07-16 13:39:10.017 [192694/start/2593865 (pid 4043)] Error (exit code 1). This could be a transient error. Use @retry to retry.
2024-07-16 13:39:10.118 [192694/start/2593865 (pid 4043)] 
2024-07-16 13:39:08.057 [192694/start/2593865 (pid 4043)] [pod t-f8605486-cl4b6-f9hhv]
2024-07-16 13:39:10.271 [192694/start/2593865 (pid 4043)] Task failed.
2024-07-16 13:39:10.398 Workflow failed.
2024-07-16 13:39:10.398 Terminating 0 active tasks...
2024-07-16 13:39:10.398 Flushing logs...
    Step failure:
This can be remedied in the short term by building the following docker image and using it with the environment var `METAFLOW_DEFAULT_CONTAINER_IMAGE=your_custom_image`:
Copy code
FROM public.ecr.aws/docker/library/python:3.10.13

RUN apt update && apt install -y curl python3 python3-pip wget

# Install Micromamba
RUN curl -Ls <https://micromamba.snakepit.net/api/micromamba/linux-64/1.5.5> | tar -xvj bin/micromamba && \
    mv bin/micromamba /usr/local/bin/

# Install Python dependencies
RUN pip install --no-cache-dir boto3==1.34.15 requests==2.22.0
👀 1
1
a
thanks! @elegant-beach-10818 - you wouldn't need to install metaflow in the image.
e
thanks for the call out!
a
this is likely an issue with the latest micromamba release. we are taking a look
👍 1
thankyou 1
this should help for now
micromamba rolled out a new release candidate version for micromamba2 that is resulting in failures. unfortunately, that version is linked to the latest release tag - we are triaging the breaking changes - but meanwhile the new metaflow package, once released, should avoid pulling in the release candidate
r
Also on step functions
a
n
Just finished dealing with this error 💪
FYI 1.5.8 seems to be working fine too
f
any idea on when the release will make it through to conda-forge
g
does this need to be updated in the "bleeding-edge" extension as well? https://github.com/Netflix/metaflow-nflx-extensions/blob/main/metaflow_extensions/netflix_ext/plugins/conda/conda.py#L1787 as we are still getting
Copy code
2024-07-16 12:12:10.383 [268/train_model/956 (pid 72466)] [pod t-4d1f8816-8qs5v-4lqfx] metaflow_extensions.netflix_ext.plugins.conda.utils.CondaStepException: Step(s): ['train_model'], Error: Conda command '['/root/.local/bin/micromamba', 'info', '--json']' returned error (1); see pretty-printed error above
p
@square-wire-39606, in addition to tim's question about timeframe to get to condaforge, we would love to stay apprised of your release of the ext lib.
a
@purple-airport-8225 the metaflow package is now available in pypi. the conda package usually is dependent on how occupied the build queue is (you can check the status here).
thankyou 1
re: ext lib - pinging @dry-beach-38304 for timelines
thankyou 1
d
making that release now with that small change.
should be out in < 5min
among us party 1
thankyou 1