Is there any reason why the FlowSpec can't be defi...
# ask-metaflow
p
Is there any reason why the FlowSpec can't be defined lower than top-level? I'm getting issues that the FlowGraph can't find the concrete FlowSpec because
__import__()
only loads the top-level module, unlike
importlib.import_module()
. I found https://outerbounds-community.slack.com/archives/C02116BBNTU/p1699048473617079 but the situation described is different than mine. I have the following scription
Copy code
project_code/
  __init__.py
  pipeline/
    __init__.py
    a_raypipeline.py
    my_metaflow_flow.py <---- new to repo
It can't be imported for the graph traversal because
__import__()
is pulling in the
project_code
module instead of
<http://project_code.pipeline.my|project_code.pipeline.my>_metaflow_flow
.
From Python 3.10 docs on
__import__()
Note how
__import__()
returns the toplevel module here because this is the object that is bound to a name by the
import
statement.https://docs.python.org/3.10/library/functions.html#import__
s
what does your
my_metaflow_flow.py
look like?>
p
it has something like
Copy code
class MyMetaflowFlow(FlowSpec):
   @step
   public start(self):
     self.next(end)
   @step
   public end(self):
     pass
no
if __main__
checks. that was intended to be in a separate file for reasons specific to the project.
the failure is during FlowGraph's ast parsing, not being able to find
MyMetaflowFlow
because the module
project_code
is imported instead of the module
<http://project_code.pipeline.my|project_code.pipeline.my>_metaflow_flow
.
a
and how are you executing the flow?
p
python project_code/pipeline/run_my_metaflow_flow.py run
which is a different script doing some pre- processing of config files before invoking MyMetaflowFlow() in the
if __main__
block
note this needs to be a separate file to pass some parameters to the Flow while still having the full CLI. the sys.argv must be mutated before the click decorators are used. which means putting the MyMetaflowFlow in it's own module that can be imported. using the Runner to run the Flow would mean recreating CLI. I want the Metaflow CLI + some preprocessing.
some local pre-processing.
a
yeah the
if __main__
bits are needed for metaflow to be able to successfully parse the flow structure
p
my question is why? it seems to be an implementation detail: using
__import__()
builtin instead of
importlib
.
I understand that the the
if __main__
block is required for the steps to execute. but that does that have to preclude starting a run a different way, like via a different CLI module or script?
the pre-processing that i need to do that is only relevant when local is configuration consolidation. the pipeline was originally written without any orchestration support, but each step is configured via a set of configuration files via Hydra. to allow the researcher to tweak the configs AND run remotely, i have a wrapper script that combines all these config files into one dict, writes that to a file, and passes that file into the flow as an IncludeFile. this must happen on the user's machine and cannot happen per-step, where the config could be reset to the default values. these config files are part of the python package, so they will be available in the base Docker image for a triggered run, like with a Runner.
a
i see. would an approach like this work?
p
almost. i want to use the metaflow CLI not the hydra one (edits are direct to files instead of via CLI overrides). so i can't use a Runner
a
could you elaborate on
use the metaflow CLI not the hydra one (edits are direct to files instead of via CLI overrides)
- maybe i don't have the full understanding here of the issue at hand
p
i need to do some pre-processing, that is only relevant to the local execution of the Flow. this pre-processing is taking all the local hydra configs, distilling them into 1 file, and passing that file pointer to the Flow to use locally or on Kubernetes. to pass this in as a new arg,
sys.argv
need to be modified, but
@click
seems to read
sys.argv
as soon as it's imported, which is when
FlowSpec
is imported, which needs to happen before the specifi
FlowSpec
can be defined. therefore the pre-processing and manipulation of
sys.argv
cannot be in the same file (without doing some weird double
if __main__
blocks). this would be fine if the FlowSpec could be imported after the pre-processing and
sys.argv
manipulation had finished. it cannot due to the AST checker's use of
__import__()
instead of
importlib
, as linked at the top of this thread.
my users want to use the CLI attached to the script via the
FlowSpec
class. they have not been using Hydra CLI's config overrides. so they are fine directly editing local files to tweak their runs.