Hi, we are looking for job queue / scheduler for o...
# ask-metaflow
l
Hi, we are looking for job queue / scheduler for our training jobs using Metaflow. We are using PyTorch and running local K8 cluster with Argo. Would like to be able to send all new jobs in to the queue with ability to have jobs wait for the resources becoming available and prioritize so if new job comes in with high priority it would become first in line. Would be great to have ability to pause/re-start low priority job if high priority one comes in. Can out of the box Metaflow do any of this? What is a go to solution in this case? Volcano? Is it worth using Ray for this? Anything else? We consider W&B Launch but it doesn't work with Metaflow. Thank you! Alex