important-london-94970
03/18/2025, 1:57 PMcurrent.torch.run, @torchrun, @checkpoint, logging my Trainer artifacts to current.checkpoint.directory?
Would someone be willing to whip up a quick example, or give me some help and I can provide one? I'm training a model in a distributed step and then want to load the model into self.model in my join step.important-london-94970
03/18/2025, 3:31 PM