we're building an extension following <https://git...
# ask-metaflow
g
we're building an extension following https://github.com/Netflix/metaflow-extensions-template/blob/master/README.md. and we'd like to do some validation when people needs to interact with k8s, e.g. via
--with kubernetes
or
argo-workflows
. Currently, we put the validation code in
config/mfextinit_org.py
, but that gets run even when user just tries to run the flow locally, which is annoying, esp. when the validation fails, and user gets confused why is k8s even relevant when they run the flow locally. What would be the recommended way to put such checks? An equivalent question is how to know if the user command needs to interact with k8s or not reliably from the extension?
v
you could make a mutator of your own that applies
@kubernetes
based on your policies - much easier than creating an extension
💡 1
(this is a new feature, in case you haven't seen it before 🙂 )
g
is it possible to make such validation transparent to users, i.e. users doesn't need to add extra code to enable such validation? With mutator, it looks user needs to import a mutator decorator and apply it to their decorator?
v
I'm glad you asked - the answer is yes - you can hide it. You have to options: • You can attach it automatically via a local config file or an environment variable (
METAFLOW_DECOSPECS=mymutator
) • Use the `BaseFlow` pattern and have your users derive their flows from it
thankyou 1
d
to be clear, the answer is “you CAN make it transparent to the user”
ah, right, Ville corrected 🙂
v
haha, right, I noticed my answer was ambiguous 😄
g
Thank you for the pointers! I'll take a closer look to see how to adopt.
👍 1
v
don't hesitate to ask here if you need help - it's a new feature so not many examples floating around yet
g
in which metaflow version did mutator become available? we're currently using
2.14.0
v
2.16
thankyou 1
g
I feel a bit confused. In my case, it's the user that decides whether to run the flow locally or on kubernetes (e.g. by specifying
--with kubernetes
) or using
argo-workflows create
there is no pre-specified policy on when to use kubernetes. I'm looking for an integration point that can help tell whether the user command will require interaction with kubernetes or not so I can do the validation accordingly. E.g. when user just run
Copy code
python flow.py run
and
flow.py
has no use of
@kubernetes
then, I can skip the validation. In other case, when
Copy code
python flow.py run --with kubernetes
or
Copy code
python flow.py argo-workflows create
then the validation logic should be triggered. Is mutator still the right pattern?
v
two options: • you can have a flow-level mutator that inspects if
@kubernetes
is present, and if so, performs validation (this might work with
argo-workflows create
out of the box, but need to confirm) • instead of
--with kubernetes
, you can instruct your users to use
--with mykubernetes
(you can call it whatever you want), which performs validation and adds
@kubernetes
interally
g
I think I understand your suggestions, but I'm also wondering if it can cover the argo-workflows create case. my impression is that
argo-workflows create
doesn't have much to do with
@kubernetes
. the validation we need to check is basically ensuring user are using the right k8s context when interacting with k8s.
v
yep, makes sense do your users call
argo-workflows create
manually? If it's done via CI/CD, it should be easy to insert a mutator programmatically
g
do your users call
argo-workflows create
manually?
Currently, yes.
v
you can do something like
Copy code
class validated_k8s(FlowMutator):
    def mutate(self, mutable_flow):
        if 'argo-workflows create' in ' '.join(sys.argv):
            print('deploying to argo')
            for name, s in mutable_flow.steps:
                s.add_decorator('kubernetes')
i.e. have a flow mutator that applies the desired
@kubernetes
decorators prior to deployment (and/or you can check what the user has added by themselves)
💡 1
g
Thank you for the suggestion! Sorry, I wasn't clear, it's not about adding
@kubernetes
to step, more like detecting it in the flow so that the extension will validate the k8s context. Since we're still using an old version and mutator isn't available, I decide to just using a flow decorator like
Copy code
def flow_init(
        self,
        flow: FlowSpec,
        graph: graph_.FlowGraph,
        environment: metaflow_environment.MetaflowEnvironment,
        flow_datastore: flow_datastore_.FlowDataStore,
        metadata: metadata_.MetadataProvider,
        logger: Callable[..., None],
        echo: Callable[..., None],
        options: dict[str, Any],
    ):
        if any(deco.name == 'kubernetes' for step in flow for deco in step.decorators) or 'argo-workflows' in sys.argv or '--with kubernetes' in sys.argv:
            validate_kubectx(metaflow_config.CSS_RUN_ENV)
the condition is just for illustrating the idea and will be made more robust.
v
makes sense. You could do exactly the same in a mutator, calling
validate_kubectx
but no need to change it if it works 🙂 mutators are way easier to create than extensions and the mutator API is stable vs. technically the extensions APIs may introduce breaking changing in the future
g
mutators are way easier to create than extensions and the mutator API is stable vs. technically the extensions APIs may introduce breaking changing in the future
understood, it's just legacy. I feel upgrading metaflow will have more pain, we forked metaflow. the custom decorator https://docs.metaflow.org/metaflow/composing-flows/custom-decorators is also new compared to the version of metaflow we use and it looks a much easier way to develop decorators then the extension way, i.e. subclass
FlowDecorator
👍 1
d
may I ask what the need to fork was (versus developing an extension). Asking to see if there is an opportunity to make something else extensible which would negate the need to fork. The whole reason why we introduced extensions was to avoid maintaining the netflix fork that existed right when metaflow was open sourced.