creamy-stone-99746
04/24/2024, 6:56 PM@kubernetes
CUDA and pypi
The code works on AWS Batch.
I've deployed k8s and the nvidia plugin for pods (https://github.com/NVIDIA/k8s-device-plugin/tree/main/deployments/helm/nvidia-device-plugin)
my pod can see the GPU using nvidia-smi
however I'm getting the following error.
@pypi_base(
python="3.11.7",
packages={"torch": "2.2.1", "torchmetrics": ""},
extra_indices=["<https://download.pytorch.org/whl/cu118>"],
)
@kubernetes(cpu=1, memory=6000, gpu=1, shared_memory=6000)
@environment(vars={"LD_LIBRARY_PATH": "/tmp", "NVIDIA_DRIVER_CAPABILITIES": "compute,utility"})
@step
OSError: libcudart.so.11.0: cannot open shared object file: No such file or directory
I've also tried using the docker image pytorch/pytorch:2.2.1-cuda11.8-cudnn8-runtime
same error.