👋 A few months back, I
asked a question here about deploying Metaflow to GCP. Naturally, after asking the question, I was almost immediately pulled into other projects and couldn't get back to the project until now. So let me try to re-ask in a better way:
I'm trying to deploy Metaflow to GCP using
https://github.com/outerbounds/metaflow-tools/tree/master/gcp/terraform as my baseline. This code works and I have a metaflow cluster up and running. The
problem is I don't know how to expose this cluster to my developers. My ideal solution would be a load balanced endpoint (e.g.
metaflow.example.com) that correctly routes their requests to the UI/API/metadata service/etc that is managed as part of our terraform for deploying metaflow generally. If I'm reading docs correctly, there are three missing parts in the current GCP example deployment:
1. The load balancer (IP, url map, other objects)
2. The k8s ingress controller holding the routes to the actual services
3. The backend service/network endpoint that connects 1 & 2.
1 is easy. There are plenty of examples of creating load balancers in GCP.
2 is harder. I have yet to see a good example of "here are the routes you need to put Metaflow behind a load balancer". Has anyone successfully made one of these (that isn't the not terribly helpful
helm chart in the metaflow-tools repo)?
1. Is also a bit confusing to me. It
looks like I need a backend service/network endpoint group to point at k8s, but it isn't clear
how I actually connect those things.
So has anyone else gotten Metaflow deployed in GKE successfully that has any advice for me? I know I
can do a lot of this manually, but I'd vastly prefer to have all of this in terraform so we can more consistently reproduce the infrastructure.