helpful-honey-48997
06/30/2024, 4:50 PMterraform apply ...
command, I get a lot of errors like the following:
β Cannot register provider Microsoft.KeyVault with Azure Resource Manager: resources.ProvidersClient#Register: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '<redacted>' with object id '<redacted>' does not have authorization to perform action 'Microsoft.KeyVault/register/action' over scope '<redacted>' or the scope is invalid. If access was recently granted, please refresh your credentials.".
I get this error for a lot of Microsoft.*/register/action
actions, and I'm not quite sure how to go about fixing this. Would anyone be able to help me figure out what permissions I'm missing for this specifically? (I'm new to Azure here so this might be pretty basic)helpful-honey-48997
07/02/2024, 4:43 PMhelpful-honey-48997
07/02/2024, 7:06 PM2024-07-02 21:02:20.456 [6/start/12 (pid 63739)] INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: '<https://stgenerablemfdefault.blob.core.windows.net/metaflow-sto>
rage-container/tf-full-stack-sysroot/IntegratedExperimentFlow/6/start/12/0.task_stderr.log'
2024-07-02 21:02:20.457 [6/start/12 (pid 63739)] Request method: 'GET'
2024-07-02 21:02:20.457 [6/start/12 (pid 63739)] Request headers:
2024-07-02 21:02:20.457 [6/start/12 (pid 63739)] 'x-ms-range': 'REDACTED'
2024-07-02 21:02:20.457 [6/start/12 (pid 63739)] 'x-ms-version': 'REDACTED'
2024-07-02 21:02:20.457 [6/start/12 (pid 63739)] 'Accept': 'application/xml'
2024-07-02 21:02:20.457 [6/start/12 (pid 63739)] 'User-Agent': 'azsdk-python-storage-blob/12.20.0 Python/3.9.18 (Linux-6.5.0-41-generic-x86_64-with-glibc2.38)'
2024-07-02 21:02:20.457 [6/start/12 (pid 63739)] 'x-ms-date': 'REDACTED'
2024-07-02 21:02:20.458 [6/start/12 (pid 63739)] 'x-ms-client-request-id': '9b8f098f-38a5-11ef-9230-c42360da3004'
2024-07-02 21:02:20.458 [6/start/12 (pid 63739)] 'Authorization': 'REDACTED'
2024-07-02 21:02:20.458 [6/start/12 (pid 63739)] No body was attached to the request
2024-07-02 21:02:20.458 [6/start/12 (pid 63739)] INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 404
when I go to the metaflow UI, I can see the task/run failed, but I can't see any logs about the errors or the stdout. would anyone be able to provide any info about how to debug this? (@square-wire-39606 / @ancient-application-36103?)ancient-application-36103
07/02/2024, 7:07 PMancient-application-36103
07/02/2024, 7:07 PMhelpful-honey-48997
07/02/2024, 7:10 PMhelpful-honey-48997
07/02/2024, 7:14 PMhelpful-honey-48997
07/02/2024, 7:17 PMancient-application-36103
07/02/2024, 7:20 PMhelpful-honey-48997
07/02/2024, 7:21 PMlogging.basicConfig
level to info a long time ago. i'm not sure why it's getting 404s and not logging the errors leading to the run failure though...helpful-honey-48997
07/02/2024, 7:21 PMancient-application-36103
07/02/2024, 7:21 PMhelpful-honey-48997
07/02/2024, 7:22 PMancient-application-36103
07/02/2024, 7:23 PM--with kubernetes
?helpful-honey-48997
07/02/2024, 7:24 PMsquare-wire-39606
07/02/2024, 7:24 PMpython flow.py run
?square-wire-39606
07/02/2024, 7:25 PMhelpful-honey-48997
07/02/2024, 7:25 PMhelpful-honey-48997
07/02/2024, 7:26 PMancient-application-36103
07/02/2024, 7:27 PMkubernetes
first.helpful-honey-48997
07/02/2024, 7:29 PMhelpful-honey-48997
07/02/2024, 7:29 PMpypi_base()
decoratorhelpful-honey-48997
07/02/2024, 7:30 PMpypi_base()
decorator) and see the run in the metaflow UI as successful (along with the produced cards), but still not see any of the logshelpful-honey-48997
07/02/2024, 7:30 PMancient-application-36103
07/02/2024, 7:40 PMhelpful-honey-48997
07/02/2024, 10:38 PM(metaflow) ruben@ruben-A6:~/work/test-metaflow/model_flows/virtual-patient$ METAFLOW_PROFILE=azure python test.py run --with kubernetes
Metaflow 2.10.8+netflix-ext(1.1.1) executing TestFlow for user:ruben
Validating your flow...
The graph looks good!
Running pylint...
Pylint not found, so extra checks are disabled.
2024-07-03 00:35:08.550 Workflow starting (run-id 12):
2024-07-03 00:35:10.397 [12/start/24 (pid 69362)] Task is starting.
2024-07-03 00:35:12.953 [12/start/24 (pid 69362)] [pod t-47ed747d-pkdsf-pp4cg] Task is starting (Pod is running, Container is running)...
2024-07-03 00:36:25.506 [12/start/24 (pid 69362)] Kubernetes error:
2024-07-03 00:36:25.592 [12/start/24 (pid 69362)] Error (exit code 1). This could be a transient error. Use @retry to retry.
2024-07-03 00:36:25.593 [12/start/24 (pid 69362)]
2024-07-03 00:36:25.707 [12/start/24 (pid 69362)] Task failed.
2024-07-03 00:36:25.824 Workflow failed.
2024-07-03 00:36:25.824 Terminating 0 active tasks...
2024-07-03 00:36:25.824 Flushing logs...
Step failure:
Step start (task-id 24) failed.
I can run it locally:
(metaflow) ruben@ruben-A6:~/work/test-metaflow/model_flows/virtual-patient$ python test.py run
Metaflow 2.10.8+netflix-ext(1.1.1) executing TestFlow for user:ruben
Validating your flow...
The graph looks good!
Running pylint...
Pylint not found, so extra checks are disabled.
2024-07-03 00:38:23.560 Workflow starting (run-id 1719959903550385):
2024-07-03 00:38:23.563 [1719959903550385/start/1 (pid 69802)] Task is starting.
2024-07-03 00:38:23.768 [1719959903550385/start/1 (pid 69802)] Task finished successfully.
2024-07-03 00:38:23.771 [1719959903550385/main/2 (pid 69805)] Task is starting.
2024-07-03 00:38:23.938 [1719959903550385/main/2 (pid 69805)] This is the main function
2024-07-03 00:38:23.973 [1719959903550385/main/2 (pid 69805)] This is the main function
2024-07-03 00:38:23.973 [1719959903550385/main/2 (pid 69805)] This is the main function
2024-07-03 00:38:23.973 [1719959903550385/main/2 (pid 69805)] This is the main function
2024-07-03 00:38:23.973 [1719959903550385/main/2 (pid 69805)] This is the main function
2024-07-03 00:38:23.973 [1719959903550385/main/2 (pid 69805)] This is the main function
2024-07-03 00:38:23.973 [1719959903550385/main/2 (pid 69805)] This is the main function
2024-07-03 00:38:23.974 [1719959903550385/main/2 (pid 69805)] This is the main function
2024-07-03 00:38:23.974 [1719959903550385/main/2 (pid 69805)] This is the main function
2024-07-03 00:38:23.974 [1719959903550385/main/2 (pid 69805)] This is the main function
2024-07-03 00:38:23.974 [1719959903550385/main/2 (pid 69805)] Task finished successfully.
2024-07-03 00:38:23.978 [1719959903550385/end/3 (pid 69808)] Task is starting.
2024-07-03 00:38:24.178 [1719959903550385/end/3 (pid 69808)] Task finished successfully.
2024-07-03 00:38:24.179 Done!
with this code:
class TestFlow(FlowSpec):
@step
def start(self) -> None:
self.next(self.main)
@step
def main(self) -> None:
for _ in range(10):
print("This is the main function")
self.next(self.end)
@step
def end(self) -> None:
pass
helpful-honey-48997
07/02/2024, 10:39 PMsquare-wire-39606
07/03/2024, 1:56 AMjob.yaml
-
apiVersion: batch/v1
kind: Job
metadata:
name: test-job
spec:
template:
spec:
containers:
- name: test
image: busybox
command: ["echo", "Hello, Kubernetes!"]
restartPolicy: Never
and then execute kubectl apply -f job.yaml
followed by kubectl get jobs
- what is the output?helpful-honey-48997
07/03/2024, 5:51 AM(metaflow) ruben@ruben-A6:~/work$ kubectl apply -f job.yaml job.batch/test-job created
(metaflow) ruben@ruben-A6:~/work$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
t-14365adb-shg28 0/1 11h 11h
t-47ed747d-pkdsf 0/1 7h15m 7h15m
t-51796fab-pwjsm 0/1 7h11m 7h11m
t-b19d1a41-g47z8 0/1 7h21m 7h21m
t-bc2fb6c3-shdcz 0/1 10h 10h
t-ca8d42f2-26p8d 0/1 10h 10h
t-e6478428-4rnq8 0/1 10h 10h
t-fa2a45d6-5qjbb 0/1 7h18m 7h18m
t-ff0eb83d-7wbpb 0/1 10h 10h
test-job 1/1 5s 5s
(i changed my terminal size after running so i think there are some collapsed lines)helpful-honey-48997
07/03/2024, 5:52 AMancient-application-36103
07/03/2024, 6:01 AMhelpful-honey-48997
07/03/2024, 6:23 AMancient-application-36103
07/03/2024, 6:24 AMhelpful-honey-48997
07/03/2024, 6:29 AMhelpful-honey-48997
07/03/2024, 6:30 AMancient-application-36103
07/03/2024, 6:31 AMhelpful-honey-48997
07/03/2024, 6:42 AMancient-application-36103
07/03/2024, 6:44 AMancient-application-36103
07/03/2024, 6:44 AMhelpful-honey-48997
07/03/2024, 6:45 AMancient-application-36103
07/03/2024, 6:47 AMhelpful-honey-48997
07/03/2024, 6:48 AMhelpful-honey-48997
07/03/2024, 6:56 AM$ kubectl get pods --no-headers -o custom-columns=":metadata.name"
but the Azure web portal shows a pod for the logs.... although it seems like that pod failed in the job info page?helpful-honey-48997
07/03/2024, 6:56 AMhelpful-honey-48997
07/03/2024, 7:41 AM