The human-centric platform for production ML & AI

Outerbounds

Hey, I have a question about the `@pytorch_parallel`, my compute environment for GPU training jobs currently only supports a `g5.8xlarge` meaning I can only perform single GPU training jobs. If I use the `@pytorch_parallel` I should be able to perform model training leveraging multiple of these instances just as in a `foreach` but in an unbounded manner?

Would it be better to initially just use a `g5.24xlarge` which has 4 GPU’s as I would not need to leverage the `@pytorch_parallel` which is quite experimental? Also, there would be more overhead in I/O communication between the smaller `g5` instances compared to the larger instance.

Are there any issues with the current implementation of the `@pytorch_parallel` and are there any examples that I can refer to?