Hello folks How easily and quickly can we release the tolera Outerbounds #dev-metaflow

Hello folks, How easily and quickly can we releas...

some-answer-28872

11/08/2022, 3:57 PM

Hello folks, How easily and quickly can we release the toleration support? Do you guys see any issue if we send the pull requests? https://github.com/Netflix/metaflow/compare/master...amerberg:metaflow:tolerations_argo My colleague @quiet-afternoon-68940 has worked on it and extended it from @narrow-lion-2703 initial work. Appreciate the feedback and support. We set some custom toleration within the organization on a shared EKS cluster and need help with this. Thnx.

✅ 1

ancient-application-36103

11/08/2022, 4:10 PM

@some-answer-28872 can you provide some more context on your use case for tolerations?

ancient-application-36103

11/08/2022, 4:14 PM

Are these applied for all the jobs/pods submitted to Kubernetes or only a few

some-answer-28872

11/08/2022, 6:10 PM

To avoid non-GPU workload might get launched on GPU nodes. We don't use nodeSelector for many of our jobs. If the workload is extended living, Karpenter will not scale down the GPU node, thinking it's having workload, and assign CPU-related tasks to the GPU node.

some-answer-28872

11/08/2022, 6:11 PM

We use tolerations to avoid that on GPU workload.

quiet-afternoon-68940

11/08/2022, 6:27 PM

To put that a bit differently, we apply a taint to GPU nodes to repel pods that don't need GPUs, so we need a corresponding toleration for the pods that do need GPUs. So, e.g. a training step needs a tolerance, whereas a preprocessing step does not.

➕ 2

some-answer-28872

11/08/2022, 6:40 PM

The alternative way is to have a node selector on all workloads, which is not a great solution and is not feasible. Would love it if there is another way to tackle it with meta flow.

ancient-application-36103

11/08/2022, 6:57 PM

If you use https://github.com/NVIDIA/k8s-device-plugin as a daemon set on your cluster that contains GPU nodes, then simply doing

@kubernetes(gpu=2)

will assign 2 GPUs to a pod on a GPU node. Only GPU specific jobs would be scheduled on these GPU nodes.

some-answer-28872

11/08/2022, 7:01 PM

Hmm, I will have to check with my infra team.

some-answer-28872

11/08/2022, 7:05 PM

https://github.com/NVIDIA/k8s-device-plugin#running-gpu-jobs

some-answer-28872

11/08/2022, 7:06 PM

Screenshot 2022-11-08 at 11.05.57 AM.png

some-answer-28872

11/08/2022, 7:06 PM

They also use toleration on that page to repel other jobs.

some-answer-28872

11/08/2022, 7:09 PM

@square-wire-39606 @icy-exabyte-25104 is our infra engineer he can answer better

icy-exabyte-25104

11/08/2022, 7:11 PM

@ancient-application-36103 We DONOT have nodeSelector across all our workloads and we use dynamic scaling of nodes. Without tolerations, pods without any nodeSelector will get scheduled in GPU nodes

icy-exabyte-25104

11/08/2022, 7:12 PM

We have to either ass nodeSelector to all our workloads, or antiAffinity to all our workloads

ancient-application-36103

11/08/2022, 7:13 PM

@icy-exabyte-25104 are you using nvidia's k8s-device-plugin?

icy-exabyte-25104

11/08/2022, 7:13 PM

yes @ancient-application-36103 we are on version v0.12.2 of it

ancient-application-36103

11/08/2022, 7:19 PM

and are you using eks for kubernetes?

icy-exabyte-25104

11/08/2022, 7:26 PM

yes @ancient-application-36103

ancient-application-36103

11/08/2022, 7:32 PM

then in that case, you don't need any tolerations in the pod spec.

ancient-application-36103

11/08/2022, 7:32 PM

Are you seeing any non-gpu jobs launched via metaflow getting scheduled on gpu nodes?

icy-exabyte-25104

11/08/2022, 7:45 PM

@ancient-application-36103 Its not only metaflow, any job with no nodeSelector or AntiAffinity will get launched on GPU, when it sees available CPU/RAM

icy-exabyte-25104

11/08/2022, 7:45 PM

GPU itself will be launched because a pod/workload demanded it. But, once the machine becomes ready it will have extra CPU/RAM which will accept any pod, without a taint

ancient-application-36103

11/08/2022, 8:25 PM

@icy-exabyte-25104 Nvidia's k8s-device-plugin for gpus will automatically taint the nodes and the pods (launched via Metaflow) that don't have gpus specified (

@kubernetes(cpu=2)

) won't be able to tolerate the tainted GPU nodes. If you still want your non GPU jobs (launched via Metaflow) to execute on GPU nodes, you can set

@kubernetes(gpu=0)

icy-exabyte-25104

11/08/2022, 8:27 PM

@ancient-application-36103 Again, we are NOT taking about Metaflow launched job, we have a cluster with lot of workloads, either we have to add nodeSelector on all the other workloads, or tolerations to metaflow workloads

ancient-application-36103

11/08/2022, 8:34 PM

Do you want to jump on a quick call to talk through options? The tolerations automatically get added to metaflow workloads - ideally you shouldn't have to do anything extra.

handsome-xylophone-36716

11/12/2022, 5:51 PM

Did this thread ever get figured out? Debating as to whether we want a shared cluster or our own and things like this could make a difference.

some-answer-28872

11/14/2022, 10:44 PM

@handsome-xylophone-36716 no, we ended up writing our custom mutating admission controller so if the meta flow label exists and GPU is requested inject the toleration.

careful-portugal-35955

11/20/2022, 4:49 PM

fwiw, I think Metaflow should support tolerations in its library. We worked around some of these issues by writing our own extension but it’s pretty brittle.

➕ 2

handsome-xylophone-36716

11/20/2022, 7:59 PM

Agreed. Not sure if exact details yet but mixed workloads needing cpu + gpu flows are crucial to our needs (and nkt paying for gpu prices)

3 Views

Open in Slack

Previous Next