Does MetaFlow keep track of datasets accessed? e.g...
# ask-metaflow
b
Does MetaFlow keep track of datasets accessed? e.g. While training, if user accesses a dataset from s3, does MetaFlow track the dataset accessed? We want to collect model lineage where at the end of a model training, we know what data was used and we can emit that to our lineage tracking systems. MLFlow is one we are using at the moment but it's
autolog
feature is buggy. I see
MetaFlow.S3
has Built-in support for lineage and versioning. I want to explore that via some examples.