I get the error when running the python script in ...
# ask-metaflow
m
I get the error when running the python script in kubernetes (with @kubernetes): S3 access failed: S3 operation failed. Key requested: s3://metaflow-test/metaflow/HelloCloudFlow/1/_parameters/1/0.attempt.json Error: Unable to locate credentials this is my config: { "METAFLOW_S3_ENDPOINT_URL": "http://minio.minio.svc.cluster.local:9000", "METAFLOW_DEFAULT_DATASTORE": "s3", "METAFLOW_DATASTORE_SYSROOT_S3":"s3://metaflow-test/metaflow", "METAFLOW_DATATOOLS_S3ROOT": "s3://metaflow-test/data", "METAFLOW_DEFAULT_METADATA" : "service", "METAFLOW_KUBERNETES_SECRETS": "minio-secret", "METAFLOW_SERVICE_INTERNAL_URL": "http://metaflow-metaflow-service.metaflow.svc.cluster.local:8080", "METAFLOW_KUBERNETES_NAMESPACE": "metaflow", "METAFLOW_SERVICE_URL": "http://metaflow-metaflow-service.metaflow.svc.cluster.local:8080" } I have metaflow installed in namespace "metaflow", minio in namespace "minio". I have created the secret: kubectl create secret generic minio-secret --from-literal=AWS_ACCESS_KEY_ID=minioadmin --from-literal=AWS_SECRET_ACCESS_KEY=minioadmin -n metaflow in the metaflow namespace. Hope that is correct. What could be the reason I get S3 Error?
i
Your secret creation and mapping that to your kubernetes jobs via setting
METAFLOW_KUBERNETES_SECRETS
looks good to me. That value only loads secrets into steps that run with
@kubernetes
or
--with kubernetes
. If the code you are running is not running as a kubernetes job, you will need to load the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY values as environment variables into your development environment.
For example, say you are running jupyterhub in a k8s cluster alongside metaflow. With your configuration above, I would expect running your flow with
python my_flow.py run --with kubernetes
to succeed, and I would expect
python my_flow.py run
to fail. But, by setting AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in my jupyter notebook environment,
python my_flow.py run
should succeed.
m
I have added these: AWS_ACCESS_KEY_ID and local execution works, but with kubernetes I now get the error: File "/home/coder/metaflow_env/lib/python3.12/site-packages/metaflow/plugins/kubernetes/kubernetes_job.py", line 326, in execute 2025-01-09 160312.131 [4/hello/10 (pid 86735)] raise KubernetesJobException( 2025-01-09 160312.131 [4/hello/10 (pid 86735)] metaflow.plugins.kubernetes.kubernetes_job.KubernetesJobException: Unable to launch Kubernetes job. 2025-01-09 160312.131 [4/hello/10 (pid 86735)] jobs.batch is forbidden: User "systemserviceaccountcoder:default" cannot create resource "jobs" in API group "batch" in the namespace "metaflow" 2025-01-09 160312.145 [4/hello/10 (pid 86735)] Task failed. 2025-01-09 160342.415 This failed task will not be retried. Internal error: The end step was not successful by the end of flow. I even have created a new service account with all permissions and added: "METAFLOW_KUBERNETES_NAMESPACE": "metaflow", "METAFLOW_KUBERNETES_SERVICE_ACCOUNT": "metaflow-runner",
i
I'm glad to hear the local execution is now working. The error you posted is a separate issue.
m
I guess running it without argo should also work on kubernetes, or? Or who is it creating the Pods, is it the Python env where the Pod runs or Metaflow-Service Pod
i
I believe the
METAFLOW_KUBERNETES_SERVICE_ACCOUNT
is what gets injected into the Job's
serviceAccountName
but this is reaching the end of my metaflow knowledge. We need someone from @gentle-musician-94515 to help
m
So it is not possible when podA with python is executed (metaflow) that it does not create in podA namespace the jobs, but in namespaceB (so that there is a http request from podA in namespaceA to metaflow-service in namespaceB to create the jobs)?