Hi everyone! I'm working on an on-premise Kubernet...
# ask-metaflow
p
Hi everyone! I'm working on an on-premise Kubernetes deployment for Metaflow, and I noticed that several issues and questions on this topic date back to 2023. However, I couldn't find any recent updates or official guidance since then. Could someone please share any current best practices, documentation, or successful configurations for deploying Metaflow on an on-prem K8s setup? Any help or insights would be greatly appreciated. Thanks in advance!
👀 1
1
h
Hey @powerful-knife-41200, unfortunately I am not aware of an e2e on-prem setup doc, but we ought to have one. For now, you can use this doc as a good starting point. Here are a few additional notes: 1. Ignore step 7 and 11-13. These are specific to minkube and airflow setups. You can use the steps 11-13 if you are trying to use airflow. 2. If you are simply wanting to get started with metaflow and not use the UI, I believe you can just use the localhost:9000 address from step 4 as
METAFLOW_S3_ENDPOINT_URL
in step 10. 3. If you are trying to get the metaflow UI working as well, there is a bit of an unsolved challenge (afaik). The UI backend pod uses the same
METAFLOW_S3_ENDPOINT_URL
specified on your local mf config. Which means the DNS/IP you use there needs to be accessible both from your local machine and the UI backend pod. This is not an issue if your local setup is in the same network perimeter as the UI backend pod. However, if it is not, you will need to figure out how to make the same name resolve to the same IP both locally and from the UI backend pod. One hacky solution is to set the minio k8s service domain as the ``METAFLOW_S3_ENDPOINT_URL` this means UI backend can easily resolve the domain. For the local part, you can port-forward the minio pod and then edit
/etc/host
to point the minio k8s service name to the localhost domain. This will make the service name resolve to localhost on your machine. Of course there maybe better solution out there. (I'd encourage you to share if you come up with a better approach). Finally, I haven't tried running argo-workflows on-prem, but I don't believe there would be any gotchas there. Also, please do share in this thread the issues you deem relevant in this thread and we can attempt to address them.
🙌 1
a
My two cents here: I have setup metaflow on-prem and its more or less functioning. I do not have any detailed documentation I can help you with - but if you have specific questions just write it here or DM and I can have a look if i ran into the same problem as well and share my approach here
p
this is the issue i mentioned before. my main problem is finding sth about metaflow architecture and administration. the document just talk about the client side and how to use it.
🙌 1
metaflow alternatives completely talk about how run metaflow, what are the components and what is their role and how metaflow works. beacuse i am somehow a ML engineer with devops background, i need to know about metaflow operational side and the resiliency and how to manage it in my company, to enable other ML and DS teams to use it .
a
I do agree - the documentation of Metaflow is partially blind-spotted - I had some hard times to setup my cluster as well. As I already said I cannot provide a detailed documentation but would give a look to any questions you might have
p
@adorable-oxygen-86530 i really appreciate your help. these are some basic questions: does metaflow have a control plane component to deploy on k8s and then submit flows to the plane? what are the components ?
i will try to use the provided docs to deploy metaflow and then will ask my operational questions too. thanks alot
b
@adorable-oxygen-86530 Im currently working to see if metaflow would be great for my group as well, ran into some issues with working with a localstack setup. ive used the metaflow helm charts provided here what were some of the problems you ran into? I wonder if these problems have been solved by these helm charts / and/or i need to keep an eye out for things i havent encountered. thanks in advance for any feedback!
a
hey @bumpy-piano-21462 well I did not ran into too many things related with the helm chart. Its actually pretty solid and I do recommend to use those charts as they are working together and you dont have to think about wiring up all on your own.
I guess its really the easiest if you give it a shot and if you encounter any problems I can check if we ran into the same issue and what helped in our stack
b
@adorable-oxygen-86530 yes, the helm charts are a time savior, my issue is specific for how the jupiter notebooks are being loaded back in the ui after a worflow run. I noticed the html generated from the card gets saved but the metaflow ui service is not able to see it, any chance youve seen that while you were setting up your environment?
a
Hey @bumpy-piano-21462 sorry ,we dont use the notebook cards. We stick to the traditional markdown and a couple of custom templates. Do the logs give you a hint? Really sorry that I am not of any help here.
b
@adorable-oxygen-86530 gotcha, yea Ive been heads down looking to see the source of the problem, seems like the way metaflow is looking for those records is failing, but that is probably an issue with the notebook card, if anythign ill post an issue on their repo if i cant get it to work soon 😄 thanks either way i'll post what i find about it here
🤝 1