I took a stab at logging lineage for a metaflow fl...
# ask-metaflow
l
I took a stab at logging lineage for a metaflow flow using OpenLineage and Marquez. The concept is super exciting.
ooh 2
The green gear is a flow. The purple gears are steps of a flow. OR they are single calls to
execute_sql()
within a step of a flow. There are 3 levels of gears ("jobs"). I didn't set up column-level lineage. That would probably take some effort with
sqlglot
. But I was able to represent an ML model as a
dataset
object, which is really cool.
message has been deleted
cc: @bored-vr-66208 @millions-army-77308 @rich-scientist-42991
👀 1
f
@lively-lunch-9285 awesome work, I've thought about this quite a bit too over the years. Last time I looked it seemed like for it to work natively it would have to live in the OL project itself: https://github.com/OpenLineage/OpenLineage/tree/main/integration with each integration basically being a thin wrapper around the existing projects cli
which last time I looked kind of forced users to use the ol version, not sure if that changed
l
Oh, hmm... I'm not an expert but: my script is just a script that imports
openlinage-python
and logs metadata. So you could certainly "instrument" metaflow flows via a decorator or inheriting from FlowSpec
f
looking at the dbt docs looks like thats still the case: https://openlineage.io/docs/integrations/dbt
l
I can't tell what the purpose of the code in the official OL repo is
I see. But does
openlineage-dbt
strictly need to live in that monorepo (is that the issue)?
f
yeah seems like its not the best way to do it but all the heavy hitters in orchestration are listed there
💯 1
f
yeah I guess OL is a spec so should be able to make it work, unclear what @straight-shampoo-11124 meant there, if
@track_lineage
was a generic example of a custom decorator or its specifically something you can subclass specifically for lineage
b
This is saaawwwwweeeeettttt!!!!! 🔥 🚀 🚀🚀🚀🚀🚀🚀
🤣 1
Nice share @lively-lunch-9285 !
@straight-shampoo-11124 , What did you mean with the
@track_lineage
decorator?
f
kind of confused about what their integration thing in the monorepo is though, and havent really seen anyone using the openlineage- specific pypi packages out in the wild
💡 1
l
@flaky-plumber-70709 from what I've gathered: 1. Outerbounds is working on native instrumentation for OL right now. 2. But we don't have to wait. We can make our own decorators that use
openlineage-python
to log the metadata
🙌 1
b
@lively-lunch-9285 , You get the hero of the week award!! 🏆
Best thing I’ve seen related to MetaFlow all week!!
l
lol, sadly it's still a lot of work to add this to your projects
b
True
f
doubletime by starting the convo on an oss metaflow vscode extension which is sorely needed too
🤣 1
l
Dude, want to contribute? I think we can get pretty far in a 1-2 day hackathon
Although if I'm honest, ZenML has a VS Code extension that has never really worked for me, so I worry we'd be setting ourselves up for a similarly bad experience with metaflow
f
well that probably is more due do a scoping issue trying to do too much at once before releasing it out to the wild - you kind of only get that 1 first impression with that kind of release
but I'm down, I'm free the next few days if you want to PM me we can set up a pairing session
l
It'd probably be 2-3 weeks before I actually make it happen 😅 My job is in a bit of a crisis atm
f
ahh well I'm still down, we can get something in the books and then chat async till then. I'm a jetbrains person so haven't written any extensions but typescript has been my daily driver the last few years (mainly cdk, nuxt)
l
Perfect, I'm a failed VS Code extension dev myself • ClearMLBentoML Gotten POCs through a few times--learned a lot about the VS Code API... and then moved on heheh
f
nice - yeah optimistic this time around bc I think llms on another level / have more context around the API since those attempts and not being a solo dev would make things a bit easier
l
True!
f
but learning from the zenml case, I think starting small with some observability / simple cli patterns would be the key. Are there any extensions that you find are particularly awesome wrt UX
l
Honestly, if you wanted to look at the code of one of those extensions, it has a lot of overlap with how we'd likely want to set up the Metaflow one • helps you make sure you have a venv active • calls Python CLI's via a subprocess • registers "commands" and hooks them up with buttons on a sidebar I actually took a lot of inspiration and code from the
black
extension, so I think a lot of the setup is pretty standard for all Python-related extensions
f
cool - I'll take a gander. I think maybe the first move would be configuration management, things like setting METAFLOW_PROFILE w a viewer. I have time tomorrow so I'll poke around your repos
🙌 1
l
Oh yes, IIRC, the ClearML extension does that. You can configure the extension to point to a custom config file path (because ClearML uses one in a very similar way to metaflow) or it defaults to the standard location
f
cool I'll mess around w that a bit and then DM you when I have anything sharable tom
👍 1
m
😮 Looks like we came in at just the right time!