salmon-furniture-31733
10/19/2023, 3:03 PMdataset_A
...file-1
...file-2
dataset_B
...file-1
...file-2
The particular bucket is Version-enabled. There are scenarios where only some files in a dataset folder get updated, and some
During training, we read each file using Metaflow's s3 client
I can't use tools like DVC as we want the files to be human readable as well - as normal s3 objects, and the datasets aren't in any individual repo
Currently looking for a way to reproduce training runs, including the exact version of each file used, ideally re-using S3's versioning, but if not, other options are fine too
I could think of a few ways to do this inside Metaflow runs. Just checking if there are any recommended best practices for it