melodic-train-1526
03/16/2022, 1:09 PMjob_definition['linuxParameters']['devices'] = {
"containerPath": "/dev/neuron0",
"hostPath": "/dev/neuron0",
"permissions": [
"read",
"write"
]
}
, when we specify @batch(inferentia=True,...)
parameter. This is working as far as metaflow lets us run the flow and run our preprocessing steps but we are hitting a very uninformative error somewhere in the process of sending the job to batch. The error just tells us:
Usage: main.py batch step [OPTIONS] STEP_NAME CODE_PACKAGE_SHA
2022-03-15 17:01:59.282 [1776/infer_video/100870 (pid 15323)] CODE_PACKAGE_URL
2022-03-15 17:01:59.282 [1776/infer_video/100870 (pid 15323)] Try 'main.py batch step --help' for help.
2022-03-15 17:01:59.282 [1776/infer_video/100870 (pid 15323)]
2022-03-15 17:01:59.282 [1776/infer_video/100870 (pid 15323)] Error: Got unexpected extra argument (432000)
Has anyone tried to add new batch parameters in a branch/fork of metaflow? Has anyone seen this error or could advice. Thanks in advanceancient-application-36103
03/16/2022, 5:55 PMinferentia
here - https://github.com/Netflix/metaflow/blob/master/metaflow/plugins/aws/batch/batch_cli.py#L153ancient-application-36103
03/16/2022, 5:56 PMmelodic-train-1526
03/16/2022, 6:57 PMmelodic-train-1526
03/17/2022, 9:46 PMinferentia=1
for example, it throws a type error as in my code in batch_client.py
, I check it is a boolean. I have added it in batch_cli.py
batch.py
batch_client.py
and batch_decorator
code, but I still getting the above error presumably because it sees --inferentia --run-time_limit 43200
and doesn’t properly recognise the --inferentia
?? Can you advise.
running /home/ec2-user/pytorch_venv/bin/python3 main.py --quiet --metadata service --environment local --datastore s3 --event-logger nullSidecarLogger --monitor nullSidecarMonitor --datastore-root <s3://metaflow-s3-development-euwe1/metaflow> batch step infer_video 445e09c5381ee16b18570566ddfe471a9f338487 <s3://metaflow-s3-development-euwe1/metaflow/Inferrer/data/44/445e09c5381ee16b18570566ddfe471a9f338487> --run-id 1841 --task-id 101118 --input-paths 1841/start/101113 --split-index 4 --retry-count 0 --max-user-code-retries 0 --namespace user:ec2-user --cpu 1 --gpu 0 --memory 7900 --image <http://619782547715.dkr.ecr.eu-west-1.amazonaws.com/ml_research/metaflow_inference_docker:inferentia|619782547715.dkr.ecr.eu-west-1.amazonaws.com/ml_research/metaflow_inference_docker:inferentia> --queue inf-metaflow-development-euwe1 --iam-role arn:aws:iam::619782547715:role/metaflow-batch_s3_task_role-development-euwe1 --inferentia --run-time-limit 432000
melodic-train-1526
03/18/2022, 1:47 PMsquare-wire-39606
03/18/2022, 11:14 PMfresh-battery-60373
03/21/2022, 5:34 PMmelodic-train-1526
03/21/2022, 6:21 PMmelodic-train-1526
04/04/2022, 3:47 PMancient-application-36103
04/04/2022, 4:05 PMmelodic-train-1526
04/04/2022, 4:14 PMsquare-wire-39606
04/13/2022, 3:23 PMmelodic-train-1526
04/13/2022, 3:27 PMmelodic-train-1526
04/13/2022, 3:27 PMancient-application-36103
04/13/2022, 3:28 PMancient-application-36103
04/13/2022, 3:29 PMancient-application-36103
04/13/2022, 3:30 PMmelodic-train-1526
04/13/2022, 3:30 PM