important-london-94970
03/18/2025, 1:57 PMcurrent.torch.run
, @torchrun
, @checkpoint
, logging my Trainer artifacts to current.checkpoint.directory
?
Would someone be willing to whip up a quick example, or give me some help and I can provide one? I'm training a model in a distributed step and then want to load the model into self.model
in my join step.important-london-94970
03/18/2025, 3:31 PM