agreeable-toddler-27168
07/15/2021, 11:35 PMagreeable-toddler-27168
07/15/2021, 11:35 PMagreeable-toddler-27168
07/15/2021, 11:36 PMstraight-shampoo-11124
07/16/2021, 1:21 PMstraight-shampoo-11124
07/16/2021, 1:28 PMpytest
are definitely something we want to support in the future.
Other ideas like getting rid of self.next
are more controversial and they would break all existing flows or add complexity, if both styles were supported. Hence it is hard to consider merging this PR as isstraight-shampoo-11124
07/16/2021, 1:35 PMagreeable-toddler-27168
07/16/2021, 2:25 PMmetaflow.api
and write flows the new way. I’m also open to a different name for that package.
self.next
hasn’t gone away, it’s just hidden from the user when they use the new metaflow.api
version of `FlowSpec`/`@step` (and new `@foreach`/`@join` that also live in the new package)agreeable-toddler-27168
07/16/2021, 2:27 PMagreeable-toddler-27168
07/16/2021, 3:04 PMstrong-secretary-3596
07/29/2021, 12:53 PMstraight-shampoo-11124
07/29/2021, 2:47 PMstraight-shampoo-11124
07/29/2021, 2:48 PMagreeable-toddler-27168
07/29/2021, 3:37 PMstraight-shampoo-11124
07/30/2021, 12:21 AMagreeable-toddler-27168
07/30/2021, 4:03 AMagreeable-toddler-27168
07/30/2021, 4:05 AMagreeable-toddler-27168
07/30/2021, 4:05 AMstraight-shampoo-11124
07/30/2021, 3:56 PMagreeable-toddler-27168
07/30/2021, 9:03 PMstraight-shampoo-11124
07/30/2021, 11:01 PMcreamy-magician-26843
08/04/2021, 10:48 PMstraight-shampoo-11124
08/04/2021, 11:55 PMstraight-shampoo-11124
08/04/2021, 11:57 PMcreamy-magician-26843
08/04/2021, 11:58 PMstraight-shampoo-11124
08/05/2021, 12:02 AMcreamy-magician-26843
08/05/2021, 12:04 AMagreeable-toddler-27168
08/11/2021, 5:04 PMagreeable-toddler-27168
08/24/2021, 5:39 PMpytest
tests existing alongside existing test/core
tests
• metaflow flow …
CLI seems like a good thing overall; some concern about letting there be 2 ways to invoke flows (python <file>
still works, when __main__
handler is present, as is always the case today). Will that confuse users?
• Need to drill into which bits of #612 actually depend on one another; right now a lot of things are separate but in sequence, though some of them could be completely factored out and not require any of the other bits to go in first. I’ve filed #661, #662, #663, #665, and #666 today, to start that process.
Composing Flows
As currently written in #612, inheritance only works with “new-style” `metaflow.api.FlowSpec`s that omit start
and end
steps, because I punted on trying to handle steps with the same name being mixed in from multiple flows.
An open question is whether/how to namespace steps (in a composed flow, after composition) based on the flow they came from. For example:
class Flow1(FlowSpec):
@step
def start(self): self.next(self.end)
@step
def end(self): pass
class Flow2(FlowSpec):
@step
def start(self): self.next(self.end)
@step
def end(self): pass
class Flow12(Flow1, Flow2): pass
What should the 4 steps in Flow12
be named?
• Flow1_start
, …, Flow2_end
(underscore-delimited): probably easiest to do, but a bit hacky/brittle; steps can have underscores in their names already, so it’s not a robust way to namespace steps.
• Flow1/start
, …, Flow2/end
(slash-delimited): seems cleanest, but might interact undesirably with existing assumptions around metastore directory structure. Or, perhaps it will Just Work, and introduce a new level of directory hierarchy that is never ambiguous?
• Flow1.start
, …, Flow2.end
(dot-delimited): might give the best of all worlds, but might also break assumptions about step names.
• or, maybe `Flow12`'s steps should actually appear to Metaflow as just {Flow1
,Flow2
} x {start
,end
}, and the Flow12
metadata should just point to invocations of those `Flow1`/`Flow2` steps, somehow?agreeable-toddler-27168
08/24/2021, 5:39 PM3.x
branch and I can focus on landing things there 🤷.
Next Steps
In general, @straight-shampoo-11124 felt he and the other maintainers have some homework to do in thinking through the large possible design space around composing flows. There’s also a fair amount of stuff in #612 that can go in first (either because later stuff depends on it, so it has to go in first, or because it’s orthogonal, and can go in separately whenever). As mentioned above, I’ve started filing PRs with some of that stuff: #661, #662, #663, #665, #666.
lmk if I missed or misrepresented anything @straight-shampoo-11124, and as always I’m interested in folks’ thoughts about these topics more generally. Perhaps we can move them to more focused GitHub issues as well.agreeable-toddler-27168
08/24/2021, 5:42 PMancient-application-36103
08/24/2021, 8:21 PM.ipynb
files by default?agreeable-toddler-27168
08/24/2021, 8:36 PMpapermill
to execute some notebooks on data artifacts (computed by previous steps in the flow); the notebooks in that case are “source code” analogous to .py
files that would be included in the source package sent to Batch.
We are able to get them included by augmenting the METAFLOW_DEFAULT_PACKAGE_SUFFIXES
env var, but the tutorials’ dealings with notebooks seemed similar to this, so I thought it might be necessary there as well, and useful to add by default. Let me know if that’s not the case.ancient-application-36103
08/24/2021, 8:43 PM.ipynb
files. Also, these files can be rather voluminous (given that they may contain execution state). Unless there is a different pressing concern for including notebooks by default, we should err on the side of caution and tightly limit what gets packaged by default.agreeable-toddler-27168
08/24/2021, 8:47 PM.py
files as an alternate way to run the flows. My mistake, I’ll close that PRancient-application-36103
08/24/2021, 8:58 PMstraight-shampoo-11124
08/24/2021, 11:27 PM@card
#659 and @kubernetes
#644 as well as refactorings originating from Netflix). After these get merged, thinking through the flow composition is a high priority which directly relates to your PR.
Thanks for your patience 🙏