Hi all, Could I please get some help on how to do...
# ask-metaflow
a
Hi all, Could I please get some help on how to do multi-node multi-GPU distributed training on Metaflow? I tried adapting this PyTorch Lightning
ClusterEnvironment
for multi-node multi-GPU training? I'm stuck on deriving the
LOCAL_RANK
, which isn't provided as an environment variable using
@pytorch_parallel
. Thanks in advance!