Hello folks, How easily and quickly can we releas...
# dev-metaflow
s
Hello folks, How easily and quickly can we release the toleration support? Do you guys see any issue if we send the pull requests? https://github.com/Netflix/metaflow/compare/master...amerberg:metaflow:tolerations_argo My colleague @quiet-afternoon-68940 has worked on it and extended it from @narrow-lion-2703 initial work. Appreciate the feedback and support. We set some custom toleration within the organization on a shared EKS cluster and need help with this. Thnx.
1
a
@some-answer-28872 can you provide some more context on your use case for tolerations?
Are these applied for all the jobs/pods submitted to Kubernetes or only a few
s
To avoid non-GPU workload might get launched on GPU nodes. We don't use nodeSelector for many of our jobs. If the workload is extended living, Karpenter will not scale down the GPU node, thinking it's having workload, and assign CPU-related tasks to the GPU node.
We use tolerations to avoid that on GPU workload.
q
To put that a bit differently, we apply a taint to GPU nodes to repel pods that don't need GPUs, so we need a corresponding toleration for the pods that do need GPUs. So, e.g. a training step needs a tolerance, whereas a preprocessing step does not.
2
s
The alternative way is to have a node selector on all workloads, which is not a great solution and is not feasible. Would love it if there is another way to tackle it with meta flow.
a
If you use https://github.com/NVIDIA/k8s-device-plugin as a daemon set on your cluster that contains GPU nodes, then simply doing
@kubernetes(gpu=2)
will assign 2 GPUs to a pod on a GPU node. Only GPU specific jobs would be scheduled on these GPU nodes.
s
Hmm, I will have to check with my infra team.
Screenshot 2022-11-08 at 11.05.57 AM.png
They also use toleration on that page to repel other jobs.
@square-wire-39606 @icy-exabyte-25104 is our infra engineer he can answer better
i
@ancient-application-36103 We DONOT have nodeSelector across all our workloads and we use dynamic scaling of nodes. Without tolerations, pods without any nodeSelector will get scheduled in GPU nodes
We have to either ass nodeSelector to all our workloads, or antiAffinity to all our workloads
a
@icy-exabyte-25104 are you using nvidia's k8s-device-plugin?
i
yes @ancient-application-36103 we are on version v0.12.2 of it
a
and are you using eks for kubernetes?
i
yes @ancient-application-36103
a
then in that case, you don't need any tolerations in the pod spec.
Are you seeing any non-gpu jobs launched via metaflow getting scheduled on gpu nodes?
i
@ancient-application-36103 Its not only metaflow, any job with no nodeSelector or AntiAffinity will get launched on GPU, when it sees available CPU/RAM
GPU itself will be launched because a pod/workload demanded it. But, once the machine becomes ready it will have extra CPU/RAM which will accept any pod, without a taint
a
@icy-exabyte-25104 Nvidia's k8s-device-plugin for gpus will automatically taint the nodes and the pods (launched via Metaflow) that don't have gpus specified (
@kubernetes(cpu=2)
) won't be able to tolerate the tainted GPU nodes. If you still want your non GPU jobs (launched via Metaflow) to execute on GPU nodes, you can set
@kubernetes(gpu=0)
i
@ancient-application-36103 Again, we are NOT taking about Metaflow launched job, we have a cluster with lot of workloads, either we have to add nodeSelector on all the other workloads, or tolerations to metaflow workloads
a
Do you want to jump on a quick call to talk through options? The tolerations automatically get added to metaflow workloads - ideally you shouldn't have to do anything extra.
h
Did this thread ever get figured out? Debating as to whether we want a shared cluster or our own and things like this could make a difference.
s
@handsome-xylophone-36716 no, we ended up writing our custom mutating admission controller so if the meta flow label exists and GPU is requested inject the toleration.
c
fwiw, I think Metaflow should support tolerations in its library. We worked around some of these issues by writing our own extension but it’s pretty brittle.
2
h
Agreed. Not sure if exact details yet but mixed workloads needing cpu + gpu flows are crucial to our needs (and nkt paying for gpu prices)