Hello everyone, Was wondering if there was any do...
# ask-metaflow
a
Hello everyone, Was wondering if there was any documentation on best practices for doing model checkpointing, particularly in the context of multi-node distributed training? I came across this here - https://github.com/zillow/metaflow/blob/25cf84301fed372d80d3f9cbd788e86b3ef32132/metaflow/plugins/frameworks/checkpoints.py
1