witty-horse-28982
03/03/2022, 3:06 AMNext
semantic instead of a DependsOn
one, which makes impossible to have optimal DAG scheduling if a node depends on multiple parents. And a parent is a dependency of multiple nodes.
The only thing we can do to run a logical DAG correctly is to schedule DAG nodes by generations. Where each generation is a layer of Parallel State. This creates a problem if the runtime of a single node in one generation is skewed (running for a long time after the rest has finished), the entire next generation will be blocked, which can happen all the time due to upstream data delay. But I want the entire workflow to keep making progress as much as they can.
Found your slack channel to learn how Metaflow solves this problem.
Came from this post: https://netflixtechblog.com/unbundling-data-science-workflows-with-metaflow-and-aws-step-functions-d454780c6280ancient-application-36103
03/03/2022, 4:03 AMwitty-horse-28982
03/03/2022, 4:14 AMNode 2
is a dependency of two childen.ancient-application-36103
03/03/2022, 4:23 AMwitty-horse-28982
03/03/2022, 4:25 AMwitty-horse-28982
03/03/2022, 4:25 AMwitty-horse-28982
03/03/2022, 4:26 AMancient-application-36103
03/03/2022, 5:30 AMwitty-horse-28982
03/03/2022, 6:13 AM