For the group: has anyone figured out (even rudimentary) data or artifact lineage with Metaflow?
I love Maria's point in
this LI post. A huge advantage of Databricks over Metaflow is it's got Unity Catalog. So if you query a table and then use it to train a model. Or if you use a model to produce inferences and write to a table. Then that lineage will actually be tracked.
So you never find youself asking "wait, if we delete this table (or rename a column), which things downstream will break?"
We actually had this issue with postgres at my last company. We used postgres in RDS as our data lakehouse for years lol. Finally, we wanted to shut it off and migrate, but it was full of tables and we had no ability to find out which pipelines/services consumed and wrote to each of those tables.
Services like datahub exist for lineage tracking, but Metaflow doesn't integrate with it.
cc:
@bored-vr-66208