acoustic-van-30942
01/24/2025, 10:29 PMjobs.batch is forbidden: User "system:anonymous" cannot create resource "jobs" in API group "batch" in the namespace "xyz"
I'm able to connect to the Kubernetes cluster outside of Metaflow on my remote dev box Linux terminal, and not sure why it's declaring me as system: anonymous
acoustic-van-30942
01/24/2025, 10:35 PM{
"METAFLOW_KUBERNETES_CONTAINER_IMAGE": "metaflow_batch_sample:main-6463f16454-183",
"METAFLOW_KUBERNETES_CONTAINER_REGISTRY": "<http://dummyaccountnum.dkr.ecr.us-west-2.amazonaws.com|dummyaccountnum.dkr.ecr.us-west-2.amazonaws.com>",
"METAFLOW_DATASTORE_SYSROOT_S3": "<s3://dummyaccount-s3-amp/metaflow>",
"METAFLOW_DATATOOLS_S3ROOT": "<s3://dummyaccount-s3-amp/data>",
"METAFLOW_DEFAULT_DATASTORE": "s3",
"METAFLOW_DEFAULT_METADATA": "service",
"METAFLOW_ECS_FARGATE_EXECUTION_ROLE": "arn:aws:iam::dummyaccountnum:role/dummyaccount-ecs-execution-role-amp",
"METAFLOW_ECS_S3_ACCESS_IAM_ROLE": "arn:aws:iam::dummyaccountnum:role/StudioIAMRole",
"METAFLOW_SERVICE_INTERNAL_URL": "<http://123-metadata-nlb-amp-c9275d2461784673.elb.us-west-2.amazonaws.com/>",
"METAFLOW_SERVICE_URL": "<https://metaflow-service.xyz.cloudos.test.com>",
"METAFLOW_KUBERNETES_NAMESPACE": "xyz",
"METAFLOW_KUBERNETES_SERVICE_ACCOUNT": "xyz-ksa",
"METAFLOW_ARGO_EVENTS_EVENT_BUS": "jobs-eventbus",
"METAFLOW_ARGO_EVENTS_EVENT_SOURCE": "argo-events-webhook",
"METAFLOW_ARGO_EVENTS_SERVICE_ACCOUNT": "operate-workflow-sa",
"METAFLOW_ARGO_EVENTS_EVENT": "metaflow-event",
"METAFLOW_ARGO_EVENTS_WEBHOOK_URL": "<https://argo-events-webhook.xyz.dev.test.com>",
"METAFLOW_DEFAULT_SECRETS_BACKEND_TYPE": "aws-secrets-manager",
"TEAM_COST_ATTRIBUTION_TAG": "tenant-totqpe",
"METAFLOW_SERVICE_AUTH_KEY": "dummy-auth"
}
hundreds-zebra-57629
01/25/2025, 12:20 AMkubectl
in the environment you are running metaflow? Do you have the assumed role's credentials in the environment?acoustic-van-30942
01/25/2025, 12:35 AMhundreds-zebra-57629
01/25/2025, 12:41 AMfrom kubernetes import client, config
# Load Kubernetes configuration from the environment
config.load_kube_config()
# Create an API client
v1 = client.CoreV1Api()
# List all pods in all namespaces
pods = v1.list_pod_for_all_namespaces(watch=False)
# Print the name and namespace of each pod
for pod in pods.items:
print(f"Pod Name: {pod.metadata.name}, Namespace: {pod.metadata.namespace}")
You don't need to share the output of the code but let me know if it actually list podsacoustic-van-30942
01/25/2025, 1:03 AMkubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '393043a4-cc27-4469-9bf1-f48c6f98c99c', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'c42ebf89-2913-4d64-a29a-2e4106cd7dd7', 'X-Kubernetes-Pf-Prioritylevel-Uid': '2dfdc37d-aa44-4431-a64e-bb989ad35b96', 'Date': 'Sat, 25 Jan 2025 00:59:38 GMT', 'Content-Length': '262'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:anonymous\" cannot list resource \"pods\" in API group \"\" in the namespace \"cagepart\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}
acoustic-van-30942
01/25/2025, 1:03 AMsagemaker-user@studio$ kubectl get pods --namespace cagepart
I get...
No resources found in cagepart namespace.
acoustic-van-30942
01/25/2025, 1:07 AMsagemaker-user@studio$ kubectl get rayclusters
Error from server (Forbidden): <http://rayclusters.ray.io|rayclusters.ray.io> is forbidden: User "cagepart-kuberay-developer" cannot list resource "rayclusters" in API group "<http://ray.io|ray.io>" in the namespace "default"
sagemaker-user@studio$ kubectl get rayclusters --namespace cagepart
No resources found in cagepart namespace.
hundreds-zebra-57629
01/25/2025, 1:12 AMacoustic-van-30942
01/25/2025, 1:15 AMapiVersion: v1
clusters:
- cluster:
server: <NETWORK_LOAD_BALANCER_URL>
insecure-skip-tls-verify: true
name: <EKS_CLUSTER_ARN>
contexts:
- context:
cluster: <EKS_CLUSTER_ARN>
user: <EKS_CLUSTER_ARN>
name: <EKS_CLUSTER_ARN>
current-context: <EKS_CLUSTER_ARN>
kind: Config
preferences: {}
users:
- name: <EKS_CLUSTER_ARN>
user:
exec:
apiVersion: <http://client.authentication.k8s.io/v1beta1|client.authentication.k8s.io/v1beta1>
args:
- --region
- us-west-2
- eks
- get-token
- --cluster-name
- <CLUSTER_NAME>
- --output
- json
command: aws
env: null
interactiveMode: IfAvailable
provideClusterInfo: false
hundreds-zebra-57629
01/25/2025, 1:16 AMaws eks update-kubeconfig --name <cluster-name>
?acoustic-van-30942
01/25/2025, 1:17 AMaws eks update-kubeconfig --name <cluster-name>
, but have to manually update it though because we use a secure network load balancer instead of the kube api server directly and also set insecure-skip-tls-verify: true
acoustic-van-30942
01/27/2025, 10:22 PMhundreds-zebra-57629
01/27/2025, 10:23 PMacoustic-van-30942
01/27/2025, 10:23 PM29.0.0
, so I downgraded kubernetes
to 29.0.0
and it finally worksacoustic-van-30942
01/27/2025, 10:23 PM32.0.0
doesn't work, so I downgraded itacoustic-van-30942
01/27/2025, 10:37 PM2025-01-27 22:28:35.033 [170/start/2669 (pid 2970)] Task finished successfully.
2025-01-27 22:28:35.499 [170/process/2670 (pid 2990)] Task is starting.
2025-01-27 22:28:38.049 [170/process/2670 (pid 2990)] [job t-5fdbf935-2vfgg] Task is starting (Job status is unknown)...
2025-01-27 22:33:32.342 1 task is running: process (1 running; 0 done).
2025-01-27 22:33:32.342 No tasks are waiting in the queue.
2025-01-27 22:33:32.342 end step has not started
jolly-pharmacist-95459
01/28/2025, 8:59 PM32.0.0
. The last working version is 31.0.0
.
This also seems like a related issue https://github.com/kubernetes-client/python/issues/2334creamy-stone-99746
02/08/2025, 12:24 AMacoustic-van-30942
02/08/2025, 12:44 AM