calm-helmet-54122
02/13/2025, 1:29 PM2025-02-13 20:11:03.681 [1739448660858353/start/1 (pid 90836)] botocore.errorfactory.ClientException: An error occurred (ClientException) when calling the SubmitJob operation: Error executing request, Exception : Job name should match valid pattern, RequestId: 6c2102ce-3c0b-4f10-8e6a-37011dba3a35
2025-02-13 20:11:03.681 [1739448660858353/start/1 (pid 90836)] Data store error:
2025-02-13 20:11:03.735 [1739448660858353/start/1 (pid 90836)] No completed attempts of the task was found for task 'BranchFlow/1739448660858353/start/1'
2025-02-13 20:11:03.736 [1739448660858353/start/1 (pid 90836)]
2025-02-13 20:11:03.771 [1739448660858353/start/1 (pid 90836)] Task failed.
Seems like Batch is not happy with the way metaflow names the job (BranchFlow/1739448660858353/start/1) but we can't find a way to make it work. Anyone experienced the same error?
Thanks for the supportcalm-helmet-54122
02/13/2025, 1:30 PMfrom metaflow import FlowSpec, step, batch
class BranchFlow(FlowSpec):
@batch()
@step
def start(self):
self.next(self.a, self.b)
@batch()
@step
def a(self):
self.x = 1
self.next(self.join)
@batch()
@step
def b(self):
self.x = 2
self.next(self.join)
@batch()
@step
def join(self, inputs):
print('a is %s' % inputs.a.x)
print('b is %s' % inputs.b.x)
print('total is %d' % sum(input.x for input in inputs))
self.next(self.end)
@batch()
@step
def end(self):
pass
if __name__ == '__main__':
BranchFlow()
calm-helmet-54122
02/13/2025, 1:30 PM# Batch compute environment
resource "aws_batch_compute_environment" "main" {
compute_environment_name = "metaflow-compute-env"
compute_resources {
max_vcpus = 16
min_vcpus = 0
security_group_ids = [var.private_ml_platform_sg_id]
subnets = var.private_subnet_ids
allocation_strategy = "BEST_FIT_PROGRESSIVE"
type = "EC2"
instance_type = ["optimal"]
instance_role = aws_iam_instance_profile.metaflow_ecs_task.arn
}
service_role = aws_iam_role.metaflow_batch_execution.arn
type = "MANAGED"
depends_on = [aws_iam_role_policy_attachment.aws_batch_service_role]
tags = {
CostCenter = var.tags_cost_center
Project = var.tags_project
Service = var.metaflow_tags_service
}
}
# Batch job queue
resource "aws_batch_job_queue" "main" {
name = "metaflow-job-queue"
state = "ENABLED"
priority = 1
compute_environment_order {
compute_environment = aws_batch_compute_environment.main.arn
order = 1
}
tags = {
CostCenter = var.tags_cost_center
Project = var.tags_project
Service = var.metaflow_tags_service
}
}
ancient-application-36103
02/13/2025, 2:49 PMcalm-helmet-54122
02/14/2025, 1:47 AM(ds-churn-prediction-py3.11) ➜ ds-churn-prediction git:(develop) ✗ make run-metaflow-pipeline-test-batch
AWS_PROFILE=sandbox-remi-singapore METAFLOW_DEBUG=1 python -m src.pipeline.metaflow_pipeline.test_flow run
Metaflow 2.12.39 executing BranchFlow for user:remi.moise@ascenda.com
Validating your flow...
The graph looks good!
Running pylint...
Pylint not found, so extra checks are disabled.
2025-02-13 20:11:00.859 Workflow starting (run-id 1739448660858353):
2025-02-13 20:11:01.890 [1739448660858353/start/1 (pid 90836)] Task is starting.
2025-02-13 20:11:03.018 [1739448660858353/start/1 (pid 90836)] Traceback (most recent call last):
2025-02-13 20:11:03.018 [1739448660858353/start/1 (pid 90836)] File "/Users/remi.moise@ascenda.com/Library/Caches/pypoetry/virtualenvs/ds-churn-prediction-6VoP5-AC-py3.11/lib/python3.11/site-packages/metaflow/plugins/aws/batch/batch_cli.py", line 316, in step
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] batch.launch_job(
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] File "/Users/remi.moise@ascenda.com/Library/Caches/pypoetry/virtualenvs/ds-churn-prediction-6VoP5-AC-py3.11/lib/python3.11/site-packages/metaflow/plugins/aws/batch/batch.py", line 407, in launch_job
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] self.job = job.execute()
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] ^^^^^^^^^^^^^
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] File "/Users/remi.moise@ascenda.com/Library/Caches/pypoetry/virtualenvs/ds-churn-prediction-6VoP5-AC-py3.11/lib/python3.11/site-packages/metaflow/plugins/aws/batch/batch_client.py", line 141, in execute
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] response = self._client.submit_job(**self.payload)
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] File "/Users/remi.moise@ascenda.com/Library/Caches/pypoetry/virtualenvs/ds-churn-prediction-6VoP5-AC-py3.11/lib/python3.11/site-packages/botocore/client.py", line 569, in _api_call
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] return self._make_api_call(operation_name, kwargs)
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-13 20:11:03.680 [1739448660858353/start/1 (pid 90836)] File "/Users/remi.moise@ascenda.com/Library/Caches/pypoetry/virtualenvs/ds-churn-prediction-6VoP5-AC-py3.11/lib/python3.11/site-packages/botocore/client.py", line 1023, in _make_api_call
2025-02-13 20:11:03.681 [1739448660858353/start/1 (pid 90836)] raise error_class(parsed_response, operation_name)
2025-02-13 20:11:03.681 [1739448660858353/start/1 (pid 90836)] botocore.errorfactory.ClientException: An error occurred (ClientException) when calling the SubmitJob operation: Error executing request, Exception : Job name should match valid pattern, RequestId: 6c2102ce-3c0b-4f10-8e6a-37011dba3a35
2025-02-13 20:11:03.681 [1739448660858353/start/1 (pid 90836)] Data store error:
2025-02-13 20:11:03.735 [1739448660858353/start/1 (pid 90836)] No completed attempts of the task was found for task 'BranchFlow/1739448660858353/start/1'
2025-02-13 20:11:03.736 [1739448660858353/start/1 (pid 90836)]
2025-02-13 20:11:03.771 [1739448660858353/start/1 (pid 90836)] Task failed.
2025-02-13 20:11:03.846 Workflow failed.
2025-02-13 20:11:03.846 Terminating 0 active tasks...
2025-02-13 20:11:03.846 Flushing logs...
Step failure:
Step start (task-id 1) failed.
make: *** [run-metaflow-pipeline-test-batch] Error 1
calm-helmet-54122
02/14/2025, 1:48 AMancient-application-36103
02/14/2025, 1:52 AMancient-application-36103
02/14/2025, 1:52 AMcalm-helmet-54122
02/14/2025, 1:56 AM