polite-house-75767
05/10/2023, 11:15 AM2023-05-10 15:12:19.801 [29/train/87 (pid 1652)] [pod t-z5c62-k8s7l] numba.cuda.cudadrv.error.CudaSupportError: Error at driver init:
2023-05-10 15:12:19.801 [29/train/87 (pid 1652)] [pod t-z5c62-k8s7l]
2023-05-10 15:12:19.801 [29/train/87 (pid 1652)] [pod t-z5c62-k8s7l] CUDA driver library cannot be found.
2023-05-10 15:12:19.801 [29/train/87 (pid 1652)] [pod t-z5c62-k8s7l] If you are sure that a CUDA driver is installed,
2023-05-10 15:12:19.801 [29/train/87 (pid 1652)] [pod t-z5c62-k8s7l] try setting environment variable NUMBA_CUDA_DRIVER
2023-05-10 15:12:19.801 [29/train/87 (pid 1652)] [pod t-z5c62-k8s7l] with the file path of the CUDA driver shared library.
2023-05-10 15:12:19.801 [29/train/87 (pid 1652)] [pod t-z5c62-k8s7l] :
2023-05-10 15:12:19.804 [29/train/87 (pid 1652)] [pod t-z5c62-k8s7l] <flow TestDollyGPUSetup step train> failed:
2023-05-10 15:12:21.903 [29/train/87 (pid 1652)] [pod t-z5c62-k8s7l] Internal error
2023-05-10 15:12:30.577 [29/train/87 (pid 1652)] Kubernetes error:
2023-05-10 15:12:30.577 [29/train/87 (pid 1652)] Error (exit code 1). This could be a transient error. Use @retry to retry.
2023-05-10 15:12:30.657 [29/train/87 (pid 1652)]
any help pls?