Morning folks. I have been reading about MetaFlow ...
# dev-metaflow
m
Morning folks. I have been reading about MetaFlow for a while now and would love to give it a shot at my company. My problem is that we are GCP only and if I am ever to get this accepted I need to be able to deploy it with fully managed services. I have had a look at the infrastructure for AWS and it seems that GCP has all the pieces needed. 1. GKE in autopilot should be a decent enough substitution for AWS batch (different things, I know, but what matters here is the managed part of the equation) I think that this requires a single line change in the Terraform template. So I think this is the easiest change I can make. 2. I think that GCP Workflows should be an easy substitute for Step Functions. I don’t know either product though, and although I Imagine that metaflow implement some sort of generic orchestrator related class that defines the API needed, I was not be able to find it. 3. GCP Secret manager should be an easy substitute, but again, I haven’t looked deep enough to understand what should be done to implement the secret manager type 4. I suppose that an always on vertex or colab notebook to monitor the flows should be an easy to do thing to substitute the SageMaker notebook with the same function All in all my main questions are: Has anyone attempted this? Although it is my understanding that metaflow has a plugin based infrastructure, I cannot find docs that clarify the APIs to implement to create new plugins at the different layers (or maybe I am missing them). Sorry for the wall of text and possibly stupid questions, but I felt that at this point my progress was slowing down enough that asking would have been a way better idea
d
Hey — maybe a partial answer: • for 2: there is argo and airflow support already which may suit your need. • for 3: that should be fairly straightforwad, see here https://github.com/Netflix/metaflow/blob/master/metaflow/plugins/__init__.py#L112, you can provide another secrets provider mirroring the other two • not sure what you want to do with notebooks — it’s not a part of the OSS release at least. The UI is a good place to monitor execution and then anything that can use the metaflow client can be used as well. For developing plugins, the API is not stable in the sense that we don’t guarantee stability but things haven’t changed in a while. There is a template to develop a plugin here: https://github.com/Netflix/metaflow-extensions-template. If you have questions on it, feel free to reach out. These are actual extensions https://github.com/outerbounds/metaflow-ray and https://github.com/Netflix/metaflow-nflx-extensions. If you have questions, feel free to reach out nad ping me directly.
a
for (2) the most paved path is to deploy Argo workflows in your GKE cluster. For (3) you can either create a secret plugin. Or I believe it should be possible to expose GCP Secrets as secrets in the kubernetes cluster, which are already supported by Metaflow out of the box
h
Hi there, Maybe someone here could help me please? I recently encountered this issue: https://github.com/Netflix/metaflow/issues/1496 I followed the suggest solution and created a new environment with metaflow 2.9.11 and conda 22.11.1 (using the linked miniconda as suggested) but still get the JSON error. So I am not sure what else to do? Will there be a release of Metaflow that works with newer conda versions? Thanks!
d
that’s weird that it doesn’t work even with 22.11.1. I haven’t looked at that implementation of conda in a while but you can try the one linked above https://github.com/Netflix/metaflow-nflx-extensions (available as the package here: https://pypi.org/project/metaflow-netflixext/). It’s not officially supported but will give you a conda implementation that should work with recent versions of conda (and a few other features). It should be a drop in replacement.
h
Amazing thanks. Will have a look later.
👍 1
@dry-beach-38304 can I please ask, should I remove metaflow 2.9.11 and use this instead and change all my imports?
d
No. You keep metaflow and you just install this package as well. There are instructions in the README on GitHub.
👍 1
h
I believe I followed the instructions but for even a simple:
import metaflow
print(metaflow.__version__ )
I now get:
ValueError: Cannot locate step_decorator plugin 'pip' at 'metaflow_extensions.netflix_ext.plugins.conda.conda_step_decorator'
or running my original code I get:
ValueError: Cannot locate step_decorator plugin 'conda' at 'metaflow_extensions.netflix_ext.plugins.conda.conda_step_decorator'
Any idea? Thanks!
d
Let me get back to you shortly.
oh, I apologize — I forgot to merge a file. Gimme a few minutes and I’ll push an update for you.
h
oh thanks. btw apologies for highjacking this thread which was originally about something else … just for my understanding. the current metaflow does not play nice with the latest conda. metaflow-netflixext is required for the time being? The suggested solution here: https://github.com/Netflix/metaflow/issues/1496 did not work:
d
Not quite: • metaflow does not play nice with the latest conda (or rather, the latest conda does not play nice with metaflow — they seem to have broken something) • for some reason the proposed solution in #1496 did not work (not sure why — I didn’t spend time investigating) • metaflow-netflixext is not required per say but it is an alternative implementation of the conda decorator that is currently used at Netflix and will, at some point, make its way in one way or another to the main repository.
ok, 1.0.1 should be out in a few minutes. That should fix your issue. Apologies for that — as I mentioned, it’s an internal plugin at Netflix and I copied over all the code and like an idiot forgot one file 😞
❤️ 1
(well 2)
It should be out. let me know if you have more questions — may be slightly unavailable for a bit but will respond when I can.
h
thanks so much! looks good: Validating your flow... The graph looks good! Running pylint... Pylint is happy! Bootstrapping Conda environment... (this could take a few minutes) Resolving 1 environment ... will run more tomorrow during official work time 😀
👍 1
@dry-beach-38304 - sorry only got around to run the flow today - I get: 2023-09-14 155740.828 [423/start/1698 (pid 74445)] [ec94ac21-fe98-4b48-b74d-67c4361bdc47] my_conda.binary(“micromamba”) 2023-09-14 155740.828 [423/start/1698 (pid 74445)] [ec94ac21-fe98-4b48-b74d-67c4361bdc47] File “/metaflow/metaflow_extensions/netflix_ext/plugins/conda/conda.py”, line 164, in binary 2023-09-14 155740.828 [423/start/1698 (pid 74445)] [ec94ac21-fe98-4b48-b74d-67c4361bdc47] self._find_conda_binary() 2023-09-14 155740.828 [423/start/1698 (pid 74445)] [ec94ac21-fe98-4b48-b74d-67c4361bdc47] File “/metaflow/metaflow_extensions/netflix_ext/plugins/conda/conda.py”, line 1720, in _find_conda_binary 2023-09-14 155740.828 [423/start/1698 (pid 74445)] [ec94ac21-fe98-4b48-b74d-67c4361bdc47] self._download_validate_remote_conda() 2023-09-14 155740.828 [423/start/1698 (pid 74445)] [ec94ac21-fe98-4b48-b74d-67c4361bdc47] AttributeError: ‘Conda’ object has no attribute ‘_download_validate_remote_conda’ which I think is due to the metaflow_extensions/netflix_ext package? I also think the latest conda version fixed the issue?
d
it may — I haven’t checked. For your specific issue, gah, I think this is another merge issue from my side (we have a slightly different version). One sec.
h
thanks this is why I mentioned it - probably would have been better to raise issue in github …
d
1.0.2 should fix it. I’ll test more fully a little later (meetings this morning)
👍 1
sorry about that. There are like 5 lines difference between the internal and external version and of course I messed it up 😞
h
no worries - thanks for quick reply - and good evening from the UK 😀