Hello. We have a multi flow data pipeline that is ...
# ask-metaflow
s
Hello. We have a multi flow data pipeline that is currently manual, ie a person runs the commands to deploy and trigger each flow as an argo workflow one after the other. Because its been developed this way each flow in the process has ended up with its own parameters, some of which are required. We want to move to using
@trigger_on_end
to launch the next flow with out the manual intervention of a person, while still allowing us to manually trigger flows if needed, but the parameters are causing an issue. Is there a way to pass parameters when creating the template or some other way that doesn't cause the previous flow in the chain to have to know about all the down stream parameters?
1
worst case we'll end up doing the work to change all the parameter handling to work around this, but it would be nice if there is a way to avoid that work.
s
Hello Emily These parameters that you mention are they dependent on any of the steps that you run in a up stream flow ?
Or are they just unique to the flow you would want to run as part of the triggered flow
I guess what I am asking is - are the values of these parameters generated as part of running an upstream flow that then need to find their way into a downstream flow
s
No, they are things like the google drive folder id to put an output report in, or the name of a dataset to use
I get what you are saying, we do have some like that, but we also already have a parameter for the upstream flow, so I can see how we can do that already. 🙂
s
in your particular use-case is it acceptable to pass the universe of parameters to the 1st flow in the daisy chain?
s
We could do, that is one of the options I'm looking at, the fun bit is still being able to start the process from any of the flows and being able to handle the parameters well
The thing I'm trying to avoid is all of the flows having to know about all of their down stream dependencies and know what all their parameters are
s
oh interesting.. so in addition to the daisy chain you do want to retain the ability for the flows to be able to run independently as well
s
yeah
s
> The thing I'm trying to avoid is all of the flows having to know about all of their down stream dependencies and know what all their parameters are makes sense.. the option I was thinking about was passing the universe of parameters into the 1st flow in the daisy chain. You could assign these parameters to
self
and then access them downstream https://docs.metaflow.org/production/event-triggering/flow-events
Copy code
When using @trigger_on_finish, you can access information about the triggering runs through current.trigger.run or current.trigger.runs in the case of multiple flows, which return one or more Run objects. Use the Run object to access artifacts as you do when using the Client API directly.
this 1
the parameters themselves - if they do not have any dynamic portions then you could consider passing them into the workflow template as env vars?
s
ooh, thats an interesting idea. Thank you, that sounds like it might work. I'll have a play and see how we get on.
👍🏽 1
Got something that looks hopeful using a combination of environment variables and deploy time parameters. https://docs.metaflow.org/production/scheduling-metaflow-flows/scheduling-with-aws-step-functions#deploy-time-parameters I've created a function that returns a function that gets the named env var or default value.
Copy code
def env_var_parameter( default: str = None):
    def get_env_var(context):
        return os.getenv(context.parameter_name, default)

    return get_env_var

doc_folder_root = Parameter(
        "doc-folder-root",
        help="The docs folder to write the report to",
        default=env_var_parameter("UID_HERE"),
    )
Then modified our metaflow launch script to create the environment variables if its a deploy and they are provided.