big-magician-57922
09/22/2023, 1:03 PMS3_ENDPOINT_URL
for the local scheduler and the created k8s pods (when using run --with kubernetes
)?
Use case: local testing with a minio bucket in a kind cluster. Minio is running inside the kubernetes cluster, and I have exposed it with an Ingress mapped to localhost of the host machine (my laptop).
So the minio api can be accessed two ways: Localhost on the host laptop (http://minio-api.localhost/) which the pods inside the cluster can't access, and minio Service
inside the cluster (http://minio.minio.svc.cluster.local:9000/), which the host machine (laptop) can't directly access.
When running my flow using "METAFLOW_S3_ENDPOINT_URL": "<http://minio.minio.svc.cluster.local:9000/>"
in my config.json
the execution hangs for a long time and then errors:
S3 access failed:
S3 operation failed.
Key requested: <s3://metaflow/metaflow/CardFlow/11/_parameters/20/0.attempt.json>
Error: Could not connect to the endpoint URL: "<http://minio.minio.svc.cluster.local:9000/metaflow/metaflow/CardFlow/11/_parameters/20/0.attempt.json>"
So the local scheduler is not able to connect with minio
When running my flow with "METAFLOW_S3_ENDPOINT_URL": "<http://minio-api.localhost/>"
in my config.json
the scheduler is happy with that and begins creating the task container, but hangs on ContainerCreating
, and when checking the pod logs I get
fatal error: Could not connect to the endpoint URL: "<http://minio-api.localhost/metaflow/metaflow/CardFlow/data/bc/bc49cccd130578cfa8ae2e1ec76cb59e3bdcb1d2>"
So the pod is not able to connect to minio.
Presumably the local scheduler packages up each task in the flow and passes the s3 endpoint url in as an environment variable from the config.json. Is there a way of overriding the env var inside each created pod? (so that the scheduler can look at localhost, and the pods can look at the internal k8s service dns)
I would also like to get this working next with argo workflows but may have a similar problem.
Any help is appreciated!