The human-centric platform for production ML & AI

Outerbounds

I'm having some trouble getting GPUs working on AWS kubernetes.

I've got the nvidia-device-plugin installed and my jobs are scheduling on my GPU instance when using `@kubernetes(gpu=1)`

I'm getting stuck trying to install `tensorflow-gpu`. When running the following code:

```from metaflow import FlowSpec, step, kubernetes, retry, conda_base

libraries = {
    "tensorflow-gpu": "2.11.1",
    "cudatoolkit": "11.0.3"
}

@conda_base(libraries=libraries, python="3.10.9")
class Tensorflow(FlowSpec):```

```    Step: start, Error: command '['/opt/conda/condabin/mamba', 'create', '--yes', '--no-default-packages', '--name', 'metaflow_Tensorflow_linux-64_615c884d84bb1f7ae93a2e331b5189159b559cc8', '--quiet', b'python==3.10.9', b'requests==&gt;=2.21.0', b'boto3==&gt;=1.14.0', b'tensorflow-gpu==2.11.1', b'cudatoolkit==11.0.3']' returned error (1): b'Could not solve for environment specs\nEncountered problems while solving:\n  - nothing provides __cuda needed by tensorflow-2.11.1-cuda112py310he87a039_0\n\nThe environment can\'t be solved, aborting the operation\n\n{\n    "success": false\n}\n', stderr=b''```
I was able to shell into the running docker container, install tensorflow-gpu manually, and it worked fine.

Any thoughts?