Hello All, I've noticed a slight discrepancy in h...
# ask-metaflow
m
Hello All, I've noticed a slight discrepancy in how the
kubernetes
and
resources
decorators handle GPUs. The
kubernetes
decorator defaults to None, while the
resources
decorator defaults to 0. Additionally, the
kubernetes
decorator updates the value of
gpu
if it is not none in the
resources
decorator - https://github.com/Netflix/metaflow/blob/master/metaflow/plugins/kubernetes/kubernetes_decorator.py#L280 Having a GPU limit of
0
results in the ExtendedResourceToleration admission controller adding a toleration to such steps, which means that they can run on GPU nodes. This is a general issue with the admission controller, but it would be good if this default were
None
in the resources decorator too. Is it set to
0
intentionally? Or it is a bug?
👀 1
s
we are looking into this. cc @bulky-afternoon-92433
thankyou 1
b
currently pitching for changing the default to
None
as it fixes part of the issues here. Still need to verify that some internal compute decorators are not affected. A related issue to this seems to be https://github.com/Netflix/metaflow/issues/2005 where GKE fails to schedule such pods altogether, whereas EKS runs them just fine. Out of curiosity, @mammoth-rainbow-82717 are you running on some managed Kubernetes service, or self-hosted?
m
We are running them on EKS.
currently pitching for changing the default to None as it fixes part of the issues here.
-> This would make sense to me.