Awesome to see official support for distributed training!
We are using Metaflow on AWS but we are planning to have a hybrid approach to use local K8s cluster with GPU for alternate compute. The idea is to have common metadata service, UI, storage on AWS. So I have questions.
• One of the main usecases for local k8s is distributed training, is there any ETA for the new features to support K8s as well?
• I see the hybrid approach to be quite possible, but not really sure. Are there any challenges here?
Thanks a lot 🙏