How does Metaflow magically remove itself as a dep...
# ask-metaflow
l
How does Metaflow magically remove itself as a dependency in steps? Like, the fact that I can store stuff on
self.x
and have it backed up to S3... feels like custom logic. Like, the kind of logic you'd find in a 3rd party package called
metaflow
😆 But if you do
pip freeze
within a step, I don't think you see any metaflow dependencies 🤯
1
e
When you launch metaflow remotely it packages its current installation along with your code and puts it in S3. Steps then rely on that code package and we run it in the current path.
❤️ 1
So we never pip install metaflow.
This guarantees you always get the same execution.
this 1
🙌 1
l
Steps then rely on that code package and we run it in the current path.
Does that technically mean Metaflow is available in the step's venv (it's in the PYTHONPATH), but it doesn't show up in
pip freeze
because it's not "installed" via the process that you'd normally install things?
e
yep
d
it’s just “installed” in PYTHONPATH — correct.
l
OOoooOoOOh
Wow. Magical. I'm assuming this would be a problem if Metaflow had dependencies. But it doesn't, right?
d
(which is, incidentally, one reason why you don’t need nor should you, include metaflow in any of your pypi/conda environments — it is ignored)
💡 2
e
For example, if it were installed via pip the PYTHONPATH would still be the one we execute.
When you are running in conda, metaflow either injects its own deps on top of yours or we vendor the packages we need within metaflow.
l
I think at one point we were seeing an unexpected dependency conflict in a step: • we were installing bentoML which depends on a bunch of OTel libs • but Metaflow also depends on OTel libs (optionally) If metaflow has deps enabled, is there a risk of them conflicting with packages in the step?
d
depending on your exact setup, it’s possible but the OSS one doesn’t do that.
(and when I mean exact setup it means that technically in extensions you can add files to the metaflow package and/or packages to the conda environment)
l
Thank you! Is this because the OSS metaflow simply has no dependencies?
Oh, lol, I guess it has these
d
those are injected in conda envs
opentelemetry is not
l
OOooh, okay got it! But these deps wouldn't conflict with a step if I installed/pinned them in a step? I suppose
requests
and
boto3
are extremely stable, so if Metaflow just leaves them unpinned when they are injected, then I'm sure it's fine if they are installed on top of the frozen env of the step.
Dang this is so interesting. Today I learned that
pip freeze
isn't as thorough as I thought it was (it doesn't include packages that are there but weren't installed)
d
I believe pip freeze looks at distributions so yes, anything in your path that is not also a distribution (you can have distributions in your path) will not be listed. I haven’t checked in detail though.
(and for completeness, extensions can change those pins: https://github.com/Netflix/metaflow/blob/master/metaflow/metaflow_config.py#L548)
l
Thanks so much for the responses @enough-nest-7788 and @dry-beach-38304! I've wondered for a long time how it seemed like the
metaflow
sdk seemed not to be there, but was still usable in the step. You guys are awesome 🤩
👍 1
s
@lively-lunch-9285 a big reason why metaflow is packaged the way it is (and how remote tasks are launched) is to ensure that your deployed flows don't have any dependency to the world outside of your cloud account - helping hugely in reproducibility and reliability of your runs.
a
also, requests and boto3 are technically soft dependencies - they are not used if you run a flow without a config that points to the metadata service or cloud storage, so technically the direct dependency column for metaflow is 0 - we will someday move them to optional...
💡 1
d
Ideally we should be following better standards and use [] to install optional things but that is a whole other story. Requests and boto3 are hard dependencies for historical reasons more than anything else.
this 1