Hey there! I’m writing unit tests using Pytest and...
# ask-metaflow
g
Hey there! I’m writing unit tests using Pytest and the metaflow.Runner class. Here is what a simple test looks like:
Copy code
from metaflow import Run, Runner

def test_sample():
    metaflow_data = run_pipeline()
    assert metaflow_data.mlflow_run_id is not None


def run_pipeline():
    with Runner("pipelines/training.py", environment="conda").run() as running:
        return Run(running.run.pathspec).data
This works for some of the flows, but it fails for the
training.py
flow in this example because that file imports some functions from a separate
common.py
file. Here is the directory structure:
Copy code
├── pipelines/
│   ├── training.py
│   └── common.py
└── tests/
    └── test_sample.py
Here is the error I get when I try to run the test (and the
Runner
tries to execute the training flow):
Copy code
>   from common import packages
E   ModuleNotFoundError: No module named 'common'
I tried fixing this problem by adding a symlink inside
/tests
pointing to
/pipelines
. I also tried to fix it by passing
cwd
to
Runner
pointing to the
/pipelines
folder. Any suggestions?
1
c
There is a cwd arg in the Runner constructor, which I think is supposed to work like
Runner(..., cwd="../pipelines")
. @brainy-truck-72938 could you confirm?
g
The
cwd
argument is there, but I can’t get it to work. I’ve tried this: 1. Setting
cwd
to the relative location of `/pipelines`:
cwd="../pipelines"
2. Setting it to the absolute path:
cwd="/Users/…/pipelines/"
It doesn’t work in either case.
c
hmm maybe I misinterpreted the docs. Another option is to symlink from
tests/common.py
to point at the actual
pipelines/common.py
file. On the SDK-based path, I'll wait for Madhur or someone with more knowledge of that feature to answer.
g
Let’s wait for an SDK-related answer. The symlinks don’t work either when using the Runner class. (Symlinks do work when running flows using
subprocess
as specified here: https://docs.outerbounds.com/use-pytest/)
👍 1
b
Hi @glamorous-pillow-95848 currently, the
cwd
argument only affects where the
subprocess
execution will take place, it doesn't affect where the flow file is loaded from. Thus, the
cwd
option doesn't work, because at the time of loading the file, the
from common import packages
needs to work.. I was able to make it work by using
PYTHONPATH
i.e.
Copy code
PYTHONPATH=/Users/madhur/Desktop/outerbounds/office-hours/sant/pipelines python tests/test_sample.py
here, the folder
sant
has this structure:
Copy code
. <-- this is `sant`
├── pipelines
│   ├── common.py
│   └── training.py
└── tests
    └── test_sample.py
https://github.com/Netflix/metaflow/pull/2182 patch should help, and you won't need to use
cwd
at all then..
let me know if it works for you
g
Okay, so the
PYTHONPATH
workaround works… but it’s ugly 🙂 It looks like you submitted a patch to fix this without having to use
PYTHONPATH
or
cwd
. Is that what the patch does?
b
yeah, it does:
Copy code
flow_dir = os.path.dirname(os.path.abspath(flow_file))
sys.path.insert(0, flow_dir)
and then also removes it later in a
finally
block
Copy code
sys.path.remove(flow_dir)
g
Awesome! You rock! I’ll wait for the patch to make it into the library. I have everything working now with
PYTHONPATH
(set via pytest.ini), but it breaks Metaflow’s colors and it feels like a hack.
b
yeah, I will post a note about this PR internally to accelerate
🙌 1
also, small note, you can directly use
_return_ running.run.data
running.run
gives you the
Run
object so you don't need to construct it explicitly
g
Good to know. Thanks!