yep, we have
a `@parallel`
decorator implemented in preparation for such use cases that will require gang scheduling. We have tested it with distributed PyTorch/TF in the past and it works.
We haven't announced it as a stable feature yet since it seems mostly everyone is in the same situation as you: They love the optionality of being able to do it in the future but practically it is not needed today 🙂
Today you can get really far with large multi-GPU instances, which provide unbeatable performance anyways compared to many basic distributed setups.
Let us know when you actually want to start testing it and we are happy to explore
@parallel
and other related features with you.