Hello everyone! I’d like to open a pull request fo...
# dev-metaflow
m
Hello everyone! I’d like to open a pull request for the Metaflow Service repository. This change attends the issue that I opened two days ago: https://github.com/Netflix/metaflow-service/issues/448. And it will allow the configuration, by the user, of the timeout to access the PostreSQL database. How can I open a pull request for someone to review my suggestion? @bulky-afternoon-92433, I think you could help me with this.
b
Heya. You should be able to fork the repo and open a PR from your forks branch, is there an issue with this? I had a quick look at the issue and figured I'll give some more context on what is possibly happening here. There is a known issue with some of the metadata service api routes where the services memory consumption can spike unexpectedly. This is due to none of the metadata service routes offering pagination. The route
/flows/{flow_id}
therefore gets all runs for a flow in one go, which depending on the deployment can be a massive set. I suspect this is why you're experiencing timeouts as well. There is an easy workaround for your case though as you're accessing a specific run. You should be able to forego the runs listing completely by instantiating a run directly with a pathspec:
Copy code
from metaflow import Run
run = Run("FlowName/run_id")
I've been working on a revamp for the metadata service routes to introduce pagination in order to tackle the aforementioned issues, so a thorough fix should be coming šŸ™‚ I don't think making the timeout configurable will properly solve the issue with the api routes, but on a completely separate note it would be a welcome addition.
It actually seems that the timeout is configurable now that I look at the
DBConfiguration
class. You can set the environment variable
Copy code
MF_METADATA_DB_TIMEOUT=<your-value>
for the deployment and that should be picked up.
m
@bulky-afternoon-92433, regarding the pull request question. There's no issue, that was just an honest question from someone who has never contributed to an open-source project šŸ™‚ I was wondering if maybe I needed some sort of temporary role to open a pull request. Thanks for the explanation of this, for the explanation of the issue and its workaround and also for making me see the existence of that
MF_METADATA_DB_TIMEOUT
envvar. I set it up and it works as expected. Maybe it could be added to the documentation to make it more visible. Thanks again.