Interested to learn how Metaflow works on StepFunc...
# dev-metaflow
w
Interested to learn how Metaflow works on StepFunction for DAG scheduling Currently StepFunction has a
Next
semantic instead of a
DependsOn
one, which makes impossible to have optimal DAG scheduling if a node depends on multiple parents. And a parent is a dependency of multiple nodes. The only thing we can do to run a logical DAG correctly is to schedule DAG nodes by generations. Where each generation is a layer of Parallel State. This creates a problem if the runtime of a single node in one generation is skewed (running for a long time after the rest has finished), the entire next generation will be blocked, which can happen all the time due to upstream data delay. But I want the entire workflow to keep making progress as much as they can. Found your slack channel to learn how Metaflow solves this problem. Came from this post: https://netflixtechblog.com/unbundling-data-science-workflows-with-metaflow-and-aws-step-functions-d454780c6280
1
a
@witty-horse-28982 Happy to elaborate. Metaflow only allows a subset of DAGs as a valid workflow - In your example - L13 to L34 transition would not be valid within Metaflow.
w
Hi Savin, thanks for reaching out to me so quickly, basically your workflow is exactly like a state machine, you don’t cannot support the following senario. Where
Node 2
is a dependency of two childen.
a
Correct - such a workflow won't be supported.
w
thank you Savin
Why is that? I am curious about the design decision.
It can be a common senario in feature engineering.
a
One way of making this a valid Metaflow graph would be to have node 1 point to node 2 rather than node A. Of course there are many ways to organise your workflow - task dependent or data dependent is one dimension for example. Metaflow workflows are task dependent primarily because task dependent workflows are a bit more easier to reason about compared to data dependent workflows.
w
thank you Savin for explaining.