Hi, what's the best way to do dataset versioning i...
# ask-metaflow
s
Hi, what's the best way to do dataset versioning inside of Metaflow? We currently have our datasets as individual files in S3.
Copy code
dataset_A
...file-1
...file-2
dataset_B
...file-1
...file-2
The particular bucket is Version-enabled. There are scenarios where only some files in a dataset folder get updated, and some During training, we read each file using Metaflow's s3 client I can't use tools like DVC as we want the files to be human readable as well - as normal s3 objects, and the datasets aren't in any individual repo Currently looking for a way to reproduce training runs, including the exact version of each file used, ideally re-using S3's versioning, but if not, other options are fine too I could think of a few ways to do this inside Metaflow runs. Just checking if there are any recommended best practices for it