I've made some further improvements/cleanups to my...
# dev-metaflow
a
I've made some further improvements/cleanups to my
metaflow.api
prototype discussed above (see README); I think I will work on opening a draft PR and/or some issues about it. There are a few pieces that seem orthogonal/separable, but I'm not sure they really are: 1. New flow-creation API (primarily the
metaflow.api/Flow
metaclass) 2. New command-line form: `metaflow flow <file>[:flow] …`; 3.
pytest
tests/utilities 4. Support for Flow composition (via inheritance) I'm not sure how worthwhile it will be to try separating them. It may add nontrivial work/review overheard, and may only muddy things; as currently implemented, I think they do benefit from each other in various ways. otoh, it may be too disruptive of a change to land in one swoop. Open to others' thoughts there. One big draft PR to start may be easiest. Some updates since my previous post: • More test coverage of foreach, split-and, and join steps • Better support for running different flows within the same python process (including expanding various global singletons related to `Parameter`s, tracking which flow is currently the "main" flow being run, and ensuring the corresponding custom CLI flags are added to
click
) • Syntactic sugar: default to the only flow in a file when running like
metaflow flow <file> run …
(the full/explicit form is
metaflow flow <file>:<flow> run …
) • Syntactic sugar: optionally take
self.input
as an argument to a "foreach" step (example) • At Celsius we have a single logical flow that was written as 4 separate flows, which are run in sequence by ad-hoc scripting that loses many of Metaflow's benefits. I've used the composition-by-inheritance in my branch to turn that into one
Flow
that mixes in the 4 underlying flows, and run it in production, so we'll likely be using fork of Metaflow in production going forward. Interested in any further thoughts people have.
one immediate order of business is for me to rebase on a more recent upstream 🙂 I'm currently downstream of d6f961ea0062af317263bf0ae2156f5e161b72fd from April 2
v
nice! There's a bunch of stuff here so I will need a while to digest it all 😄
l
I had a broader question on
Parameter
management based on this way of flow composition. As flow composition is based on the inheritance of flows; there can be potentially a lot of parameters for the final child flow. How will the instantiation of steps and access to these parameters take place in this compositional scenario? Meaning if
param1
is part of the child class and
param2
is part of the parent class then will the child class step have access to the
param2
attribute? Some parameters like a 
learning_rate
 could be common across different flows and could clash. What happens in such cases? What is the flexibility while invoking large flows with lots of parameters (Can't CLI everything)?
👍 1
s
yep, that's a good point about Parameters. When I've been thinking about flow composition previously, I came to the conclusion that there are few different scenarios that require different types of treatment.
one such case is that you have one large flow that you have divided in multiple subflows just to keep it manageable but you control the data flow and artifacts, so you can handle name clashes too. This is like unhygienic macros in various programming languages like C. Another case is that you want to use a 3rd party subflow, in which case you shouldn't have to worry about name clashes. This is like using a module / package in Python. A yet another interesting case is when a 3rd party super-flow calls a subflow that you define. Such a generic superflow/wrapper could e.g. handle hyperparameter search by running any given flow with different parametrizations. In this case, you want to hoist Parameters from your subflow to the wrapper somehow.
I definitely requires some thinking how to handle the different cases cleanly