Hello, what is the right approach if I want to sch...
# ask-metaflow
b
Hello, what is the right approach if I want to schedule 2 flows (on Argo) that have exactly the same code but differ in 1 parameter? For example, something like the following I want to run hourly by
--model val1
and run another
--model val2
. They should be independent from each other, however, they share the same code.
Copy code
@project(name="hello_flow")
@schedule(hourly=True)
class HelloWorld(FlowSpec):

    model = Parameter(
        "model",
        required=True,
        help="e.g. val1, val2, etc.",
    )

    @step
    def start(self) -> None:
        <http://logger.info|logger.info>("Hello World run started!")
        self.next(self.greet)

    @step
    def greet(self) -> None:
        <http://logger.info|logger.info>(" ####### Hello World from %s! #######", self.model_slug)
        self.next(self.end)

    @step
    def end(self) -> None:
        <http://logger.info|logger.info>("Hello World run completed!")


if __name__ == "__main__":
    HelloWorld()
1
We tried with this and it failed:
Copy code
poetry run python hello_flow/flow.py --model val1 argo-workflows create
pyproject_name: hello-flow
Usage: flow.py [OPTIONS] COMMAND [ARGS]...
Try 'flow.py --help' for help.

Error: no such option: --model
d
Look at configs. Instead of parameter, use config.
this 1
b
oh thats new! you mean this, right?
d
Yes. And yes new.
b
cool thanks! will try it and let you know how it goes. Appreciate it
h
@dry-beach-38304 to confirm my understanding, every time we execute
poetry run python hello_flow/flow.py --model <someval> argo-workflows create
it will overwrite the Argo cron workflow definition, with the only change being the value for
model
? it will not create
k
Argo cron workflows if I provide
k
different values for
model
. we would try on our side, but need to update
metaflow
version to test.
this 1
for our use case, we do want to deploy
k
independent Argo cron workflows where the only difference is the config/parameters. the only other way I could think to do this to upload our target flow as an Argo workflow template and then use some sort of event triggering for the
k
configs where we trigger and pass the config to the template, but I figure there has to be something simpler.
d
You can name the workflow diffrentely. You can use project/brwnch for that purpose (I think for Argo it mangles that in rhr name).
h
> You can name the workflow diffrentely. hmmm, ideally, these k versions should live on
master
and be deployed as "production" via
--production
. not sure if we want to manage branches for this use case as we want to treat them as independent flows that happen to use the same flow definition. branches feel targeted towards use case for experimentation where at some point you want to merge back to some production branch.
maybe this boils down to this use case, Dev 1 has built a somewhat generic/parameterized Flow A for Task X, how could Dev 2 most efficiently reuse Flow A for some different Task Y, assuming Dev 2 only needs to change params/config to be suitable for Task Y.
a
@hundreds-wire-22547 would you like to see
k
different workflow templates being deployed? if so, you can change the name of the deployed template using
--name
-
python flow.py argo-workflows --name foo create
💡 1
h
yeah, in our case
k
cron workflows
s
yep
--name
might suffice. of course, you can also just create a single deployment and trigger it using different parameter values.
you can also script up the deployment bits using the deployer api
also, to add to this - the goal with branches is indeed being able to deploy variants of the same flow - the use case may vary - it might be experimentation oriented or it can be the use case that you outlined. in all scenarios - these variants execute within their own metaflow namespace - so you can be sure that they are not stepping over each other.
h
cc @great-egg-84692
👀 1
d
Sorry. Am out so a bit slow. To continue on this though — agreed with @square-wire-39606 re usage of branch. You can actually even have —branch and —production at the same time. It will do something like name.prod.branch instead of name.test.branch for a regular branch or name.prod for just production. Deployer works great and you can double it up wirh configs too. One thing that comes to mind as I am writing this is that I don’t think the project decorator takes branch and production flags (it only does through cli). That’s something we could add though (need to discuss but makes a lot of sense in light of the addition of configs). You could thrn simply iterate on your configs and everything would be taken care of that way.
h
@dry-beach-38304 possible to share simple example of deployer + configs?
d
let me try to get something out soon. Just got back in so catching up.
🙌 1
OK, this will need this: https://github.com/Netflix/metaflow/pull/2200 which should merge in the next release but with that (or if you just want to use the branch off of that PR), you should be able to do something like this (not tested but the idea should be pretty close):
Copy code
from metaflow import project, FlowSpec, config_expr, Config

@project(name="hello_flow", branch=config_expr("config.model")
@schedule(hourly=True)
class HelloWorld(FlowSpec):

    config = Config("config", required=True)
...
Then you can have various configs like:
Copy code
# config1.json
{
  "model": "val1",
  ...
}
etc (config2.json, …) Then you can do something like this to deploy:
Copy code
for config_file in ["config1.json", ...]:
    Deployer("myflow.py", config=[("config", config_file)]).argo_workflows().create()
That will deploy stuff with projects called
hello_flow.test.val1
, etc. You can also add
production=True
to the
@project
to get the
hello_flow.prod.val1
.
🙌 1
h
Thanks @dry-beach-38304! Could I also iterate through configs with different arg for
--name
as an alternative to branch?
d
you can already do that without 2200 yes.
👍 1
2200 adds branch/production but the current state is that you can currently set
name
in the
@project
decorator.
note I think you can also pass values programmatically to the deployer for config. I have to check (I forgot what I did). ie: you don’t necessarily have to go through a file.
file is useful if you already have one but if you are generating one in your script I don’t think you need to write it out.
h
got it, and whether done programmatically or through file, setting
name
will override any value set currently in the
@project
decorator?
d
well, I wouldn’t say “override”. Your flow does need to be written to take into account configs. It would look something like this:
Copy code
from metaflow import project, FlowSpec, config_expr, Config

@project(name=config_expr("config.model")
@schedule(hourly=True)
class HelloWorld(FlowSpec):

    config = Config("config", default_value={"model": "my_default_model"})
In this case, if you have no config passed in, the name of the project would be
my_default_model
and if you did have a config it would use whatever was in that. So in that sense, it would override the default but there is no
@project(name="my_default")
and then magically we override
my_default
if you provide a config. You have to say that the name comes from the config (but can have a default config).
h
makes sense, thanks for all the help here!
d
sure, let me know if you have more questiosn. It’s a new feature so there may still be quirks and things that don’t work so feel free to ask for help (and don’t assume that you are doing something wrong 🙂 )
👍 1
b
@ripe-alarm-8919 cc