Are GPUs supposed to work in the vanilla AWS setup...
# ask-metaflow
t
Are GPUs supposed to work in the vanilla AWS setup using Terraform? I have been struggling a lot with jobs that get stuck in the RUNNABLE state. After a whole day if debugging, I have figured out that the AMI that is being used does not seem to support GPUs (
amzn2-ami-ecs-hvm-2.0.20230321-x86_64-ebs
). If I manually copy the instances that gets automatically started by Metaflow and chnage the AMI to
amzn2-ami-ecs-gpu-hvm-2.0.20230321-x86_64-ebs
and also manually adds these to my cluster, then I can get the flow to run. One other thing that seems strange to me it the name of the cluster is called
metaflow-cpu-vccg...
(note the cpu in there) I have tried specifying gpu=1 using bot
@resource(gpu=1)
and
@batch(gpu=1)