The human-centric platform for production ML & AI

Outerbounds

When setting up an image processing/training job using a huggingface-based model, what's the accepted best practices for loading data into batch instances on aws for repeated experimentation?  I have a modestly sized set of images that's roughly 100gb in size, and I want to experiment with various ViT/Swin models to classify a prelabeled image dataset.  I can (and have, locally) created a huggingface dataset using their directory structure to assign labels (ie, "(dataset-name)/(train|test|validation)/(label)" as the directory structure.  Is the idea behind metaflow.s3 that I should just create that huggingface dataset in my own s3 bucket, then copy it into every instance on creation?  (That seems like a lot of time to set up the job, and also kind of a pain if the proper amount of space hasn't been allocated, but I'm not sure if I'm understanding things properly)