Hi, we are encountering performance issues when ex...
# ask-metaflow
h
Hi, we are encountering performance issues when executing Metaflow flows involving large fan-outs (500+ parallel processes). The primary symptom appears to be a database bottleneck, where the database struggles to commit metadata writes at the required speed. During these high-concurrency runs, we observe the following database-level errors:
Copy code
2025-04-07 21:08:42.224 UTC [9378]: [1-1] db=metaflow,user=metaflow ERROR: duplicate key value violates unique constraint "steps_v3_flow_id_run_id_step_name_key"
My guess is that this specific error might result from initial write attempts timing out but eventually succeeding, causing subsequent retries to conflict with the unique constraint. Additionally, we see errors originating from the Metaflow services, such as:
Copy code
failed (code 400): {"message": "need to register run_id and task_id first"}
Database monitoring during these periods shows significant spikes in query latency, CPU utilization, and wait events related to CPU contention. I have had success reducing these issues by limiting the max-workers for these jobs, and migrating to a database instance with higher CPU resources. Has anyone else run into these same situations? Are there best practices for tuning the Metaflow service or database to perform better?
1
a
what is the size of the db? we regularly run 50k-100k fanouts without any issues
h
Storage Size?
a
and the instance
some of these errors are red-herrings
we need to clean them up from the logs
h
We are using GCP and are using the
db-perf-optimized-N-8
instance, the volume size is 20 GB ssd
a
have you looked into using alloydb?
h
hmm, did not realize they had yet another managed postgres-db product
Is that what you recommend? I'm just reading about it now, but using CloudSQL seemed like the easy choice.
a
you can definitely tune the current setup too - metadata service is very thin and doesn't do much work by design - depending on your usage patterns, you would have to tweak the resources.
s
let me know if you would like to chat live!
h
Sure, I'm in meetings for the next couple of hours but I can chat at 5 PM ET
s
invite sent!
thankyou 1
a
also tagged you in our private slack channel