I get the error when running the python script in kubernetes Outerbounds #ask-metaflow

I get the error when running the python script in ...

mysterious-room-21161

01/09/2025, 3:23 PM

I get the error when running the python script in kubernetes (with @kubernetes): S3 access failed: S3 operation failed. Key requested: s3://metaflow-test/metaflow/HelloCloudFlow/1/_parameters/1/0.attempt.json Error: Unable to locate credentials this is my config: { "METAFLOW_S3_ENDPOINT_URL": "http://minio.minio.svc.cluster.local:9000", "METAFLOW_DEFAULT_DATASTORE": "s3", "METAFLOW_DATASTORE_SYSROOT_S3":"s3://metaflow-test/metaflow", "METAFLOW_DATATOOLS_S3ROOT": "s3://metaflow-test/data", "METAFLOW_DEFAULT_METADATA" : "service", "METAFLOW_KUBERNETES_SECRETS": "minio-secret", "METAFLOW_SERVICE_INTERNAL_URL": "http://metaflow-metaflow-service.metaflow.svc.cluster.local:8080", "METAFLOW_KUBERNETES_NAMESPACE": "metaflow", "METAFLOW_SERVICE_URL": "http://metaflow-metaflow-service.metaflow.svc.cluster.local:8080" } I have metaflow installed in namespace "metaflow", minio in namespace "minio". I have created the secret: kubectl create secret generic minio-secret --from-literal=AWS_ACCESS_KEY_ID=minioadmin --from-literal=AWS_SECRET_ACCESS_KEY=minioadmin -n metaflow in the metaflow namespace. Hope that is correct. What could be the reason I get S3 Error?

important-london-94970

01/09/2025, 3:55 PM

Your secret creation and mapping that to your kubernetes jobs via setting

METAFLOW_KUBERNETES_SECRETS

looks good to me. That value only loads secrets into steps that run with

@kubernetes

--with kubernetes

. If the code you are running is not running as a kubernetes job, you will need to load the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY values as environment variables into your development environment.

important-london-94970

01/09/2025, 3:56 PM

For example, say you are running jupyterhub in a k8s cluster alongside metaflow. With your configuration above, I would expect running your flow with

python my_flow.py run --with kubernetes

to succeed, and I would expect

python my_flow.py run

to fail. But, by setting AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in my jupyter notebook environment,

python my_flow.py run

should succeed.

mysterious-room-21161

01/09/2025, 4:08 PM

I have added these: AWS_ACCESS_KEY_ID and local execution works, but with kubernetes I now get the error: File "/home/coder/metaflow_env/lib/python3.12/site-packages/metaflow/plugins/kubernetes/kubernetes_job.py", line 326, in execute 2025-01-09 160312.131 [4/hello/10 (pid 86735)] raise KubernetesJobException( 2025-01-09 160312.131 [4/hello/10 (pid 86735)] metaflow.plugins.kubernetes.kubernetes_job.KubernetesJobException: Unable to launch Kubernetes job. 2025-01-09 160312.131 [4/hello/10 (pid 86735)] jobs.batch is forbidden: User "systemserviceaccountcoder:default" cannot create resource "jobs" in API group "batch" in the namespace "metaflow" 2025-01-09 160312.145 [4/hello/10 (pid 86735)] Task failed. 2025-01-09 160342.415 This failed task will not be retried. Internal error: The end step was not successful by the end of flow. I even have created a new service account with all permissions and added: "METAFLOW_KUBERNETES_NAMESPACE": "metaflow", "METAFLOW_KUBERNETES_SERVICE_ACCOUNT": "metaflow-runner",

important-london-94970

01/09/2025, 4:09 PM

I'm glad to hear the local execution is now working. The error you posted is a separate issue.

mysterious-room-21161

01/09/2025, 4:10 PM

I guess running it without argo should also work on kubernetes, or? Or who is it creating the Pods, is it the Python env where the Pod runs or Metaflow-Service Pod

important-london-94970

01/09/2025, 4:12 PM

I believe the

METAFLOW_KUBERNETES_SERVICE_ACCOUNT

is what gets injected into the Job's

serviceAccountName

but this is reaching the end of my metaflow knowledge. We need someone from @gentle-musician-94515 to help

mysterious-room-21161

01/09/2025, 4:55 PM

So it is not possible when podA with python is executed (metaflow) that it does not create in podA namespace the jobs, but in namespaceB (so that there is a http request from podA in namespaceA to metaflow-service in namespaceB to create the jobs)?

14 Views

Open in Slack

Previous Next