Hello! I am using a custom docker image for some o...
# ask-metaflow
s
Hello! I am using a custom docker image for some of my flows. This image has 2 versions of
glibc
on it
Copy code
$ whereis ldd
ldd: /usr/bin/ldd /opt/glibc-2.28/bin/ldd
/usr/bin/ldd
is v2.26 and
/opt/glibc-2.28/bin/ldd
is v2.28. This is for a package I want to use,
onnxruntime
, which requires
glibc>=2.27
. When I try to install this package directly on the image it works. But when done via Metaflow it complains with
Copy code
onnxruntime-1.19.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl is not a supported wheel on this platform.
which is possible because it does not recognize 2.28 is available(?). Any ideas how to resolve this? I know metaflow tinkers around with
LD_LIBRARY_PATH
but I cannot figure out if that is interfering.
I checked that the tags on the package
cp39-cp39-manylinux_2_28_x86_64
and
cp39-cp39-manylinux_2_27_x86_64
are compatible with the python on the image. I don't know if this is a hard constraint but putting it in here for reference.
Copy code
python -m pip debug -v
WARNING: This command is only meant for debugging. Do not use this with automation for parsing and getting these details, since the output and options of this command may change without notice.
pip version: pip 23.3.1 from /usr/local/lib/python3.9/site-packages/pip (python 3.9)
sys.version: 3.9.18 (main, Dec 21 2024, 14:21:25) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-17)]
sys.executable: /usr/local/bin/python3.9
...
...
Compatible tags: 600
  cp39-cp39-manylinux_2_28_x86_64
  cp39-cp39-manylinux_2_27_x86_64
...
...
h
you can try setting
CONDA_OVERRIDE_GLIBC=2.28
s
I launch my flow like
CONDA_OVERRIDE_GLIBC=2.28 python /path/to/flow.py --environment=conda run --with kubernetes
but no luck
h
Does the machine you are launching from also have glibc 2.28?
s
Locally on my machine, no.
Is
CONDA_OVERRIDE_GLIBC
passed through when bootstrapping the env on the remote machine?
and how does it work without this env variable when trying to install directly on the image?
a
@salmon-agency-70336 when you say
installing via metaflow
- are you using
@conda
or
@pypi
? if so, the one included out of the box in metaflow or the netflix extensions?
s
So metaflow + extensions:
Copy code
Metaflow 2.13+netflix-ext(1.2.3) executing
I am using the
@pypi
decorator on my steps. But I believe that is not compatible with
--environment=pypi
when using the extensions package and I ended up using
--environment=conda
when running the flow. Also, when running these are all the env variables I set
Copy code
CONDA_OVERRIDE_GLIBC=2.28 METAFLOW_CONDA_DEPENDENCY_RESOLVER="conda" CONDA_CHANNELS="conda-forge" METAFLOW_DEBUG_CONDA=1 python ...
a
--environment=pypi
is a symlink to
--environment=conda
. do you run into the same issue if you try executing without the extensions (you may have to uninstall them)?
s
Yes. I had the same issue without the extensions package. I installed the extensions while trying out stuff to hopefully make it work. About the
--environment
issue. I get this when I set this to
pypi
Copy code
Incompatible environment:
    The pypi_base decorator requires --environment=conda
a
what is the full set of dependencies that you are specifying?
s
Copy code
typer[all]==0.9.4
pandas==1.5.3
numpy==1.26.4
tqdm==4.66.4
jsons==1.6.3
pyyaml==6.0.2
python-consul2==0.1.5
pydantic==2.8.2
jsonpath-ng==1.6.1
word2number==1.1
schema==0.7.5
envyaml==1.10.211231
boto3==1.35.58
requests==2.32.3
transformers==4.41.2
tokenizers==0.19.1
gliner==0.2.13
onnxruntime
is a transitive dependency of
gliner
FYI,
word2number==1.1
does not have a
.whl
file but we have one in our private repo. So you can drop that.
d
I suspect resolution is going fine and it does resolve wirh packages that should work with 2.28 but the issue is likely that when installing or what not, it’s not finding the right glibc. You can maybe try setting some combination of PATH, LD_LIBRARY_PATH using the environment decorator. I suppose you are gettting an error while it is bootstrapping correct? Do you have the full error log?
s
Yes, the error happens at the bootstrapping stage. I've tried setting
LD_LIBRARY_PATH
but that does not help for some reason.
Copy code
[pod t-c7466149-9dbg7-2cc86] ERROR: onnxruntime-1.19.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl is not a supported wheel on this platform.
[pod t-c7466149-9dbg7-2cc86]
[pod t-c7466149-9dbg7-2cc86] No STDERR
[pod t-c7466149-9dbg7-2cc86] Traceback (most recent call last):
[pod t-c7466149-9dbg7-2cc86]   File "<frozen runpy>", line 198, in _run_module_as_main
[pod t-c7466149-9dbg7-2cc86]   File "<frozen runpy>", line 88, in _run_code
[pod t-c7466149-9dbg7-2cc86]   File "/home/metaflow/metaflow_extensions/netflix_ext/plugins/conda/remote_bootstrap.py", line 97, in <module>
[pod t-c7466149-9dbg7-2cc86]     bootstrap_environment(*sys.argv[1:])
[pod t-c7466149-9dbg7-2cc86]   File "/home/metaflow/metaflow_extensions/netflix_ext/plugins/conda/remote_bootstrap.py", line 60, in bootstrap_environment
[pod t-c7466149-9dbg7-2cc86]     my_conda.create_for_step(step_name, resolved_env, do_symlink=True),
[pod t-c7466149-9dbg7-2cc86]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[pod t-c7466149-9dbg7-2cc86]   File "/home/metaflow/metaflow_extensions/netflix_ext/plugins/conda/conda.py", line 431, in create_for_step
[pod t-c7466149-9dbg7-2cc86]     raise CondaStepException(e, [step_name]) from None
[pod t-c7466149-9dbg7-2cc86] metaflow_extensions.netflix_ext.plugins.conda.utils.CondaStepException: Step(s): ['prep_dataset'], Error: Could not install pypi dependencies using '['/home/micromamba/envs/metaflow_fd2cd46435561c7590b794174c4c51a93e2e3c84_99ff578188837cb0de8b11b862cbc89ed4dedd58/bin/python', '-m', 'pip', 'install', '--no-deps', '--no-input', '--no-compile', '-r', '/tmp/tmpsvxxeu2i']' -- got errorcode 1'; see pretty-printed error above
Task failed.
Task is starting (retry).
Also setting PATH via the environment decorator causes some other issues:
Copy code
Dec 27 06:08:13.898 | root [3008] | ERROR | [Errno 2] No such file or directory: 'aws'
Traceback (most recent call last):
File "/Users/narayan/anaconda3/envs/metaflow-env-39/lib/python3.9/site-packages/metaflow/plugins/kubernetes/kubernetes_cli.py", line 271, in step
kubernetes.launch_job(
File "/Users/narayan/anaconda3/envs/metaflow-env-39/lib/python3.9/site-packages/metaflow/plugins/kubernetes/kubernetes.py", line 162, in launch_job
self._job = self.create_job_object(**kwargs).create().execute()
File "/Users/narayan/anaconda3/envs/metaflow-env-39/lib/python3.9/site-packages/metaflow/plugins/kubernetes/kubernetes_job.py", line 326, in execute
raise KubernetesJobException(
metaflow.plugins.kubernetes.kubernetes_job.KubernetesJobException: Unable to launch Kubernetes job.
jobs.batch is forbidden: User "system:anonymous" cannot create resource "jobs" in API group "batch" in the namespace "default"
2024-12-27 11:38:16.012 [3483/prep_dataset/19627 (pid 3008)] Task failed.
I don't know why it has my local env directory,
/Users/narayan/anaconda3/envs/metaflow-env-39/
, in the stacktrace?