Hey! I was wondering how Metaflow handles package ...
# ask-metaflow
e
Hey! I was wondering how Metaflow handles package imports across steps. We're using a separate custom image for each step, where each image has a separate set of packages. Specifically, Metaflow is returning an error saying that it can't find a module from the previous step, even though the erroring step doesn't attempt to import that module. For example, consider these flow steps:
Copy code
@kubernetes(
        image='a_image_name'
    )
    @step
    def a(self):
        from some_package_for_a import a_logic

        self.output = a_logic()

        self.next(self.b)

    @kubernetes
    @step
    def b(self, inputs):
        print([input for input in inputs])
I'd expect that to finish without errors, but instead we get:
Copy code
2025-05-22 16:02:22.552 [11/b/60 (pid 179819)] ModuleNotFoundError: No module named 'some_package_for_a'
Why is this the case?
1
d
Are you accessing an artifact from b that was pickled in an and requires that package?
What’s the full trace?
e
Potentially? Here's a trace (with actual names instead of
a
and
b
):
Copy code
2025-05-22 16:02:22.550 [11/join/60 (pid 179819)] File "/metaflow/metaflow/cli_components/step_cmd.py", line 167, in step
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] task.run_step(
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] File "/metaflow/metaflow/task.py", line 670, in run_step
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] self._exec_step_function(step_func, input_obj)
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] File "/metaflow/metaflow/task.py", line 64, in _exec_step_function
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] step_function(input_obj)
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] File "/metaflow/flow.py", line 35, in join
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] self.results = [(input.packages, input.updates) for input in inputs]
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] File "/metaflow/flow.py", line 35, in <listcomp>
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] self.results = [(input.packages, input.updates) for input in inputs]
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] File "/metaflow/metaflow/flowspec.py", line 440, in __getattr__
2025-05-22 16:02:22.551 [11/join/60 (pid 179819)] x = self._datastore[name]
2025-05-22 16:02:22.552 [11/join/60 (pid 179819)] File "/metaflow/metaflow/datastore/task_datastore.py", line 45, in method
2025-05-22 16:02:22.552 [11/join/60 (pid 179819)] return f(self, args, kwargs)
2025-05-22 16:02:22.552 [11/join/60 (pid 179819)] File "/metaflow/metaflow/datastore/task_datastore.py", line 865, in __getitem__
2025-05-22 16:02:22.552 [11/join/60 (pid 179819)] _, obj = next(self.load_artifacts([name]))
2025-05-22 16:02:22.552 [11/join/60 (pid 179819)] File "/metaflow/metaflow/datastore/task_datastore.py", line 369, in load_artifacts
2025-05-22 16:02:22.552 [11/join/60 (pid 179819)] yield name, pickle.loads(blob)
2025-05-22 16:02:22.552 [11/join/60 (pid 179819)] ModuleNotFoundError: No module named 'ubuntu_updates'
d
Yep that’s it
e
Taking a closer look, I think that might be the case. The A step's
self.output
is typed as a symbol from the A package. So it would make sense that when the B package attempts to unpickle it, it would not be able to find that symbol type
Sweet, thanks for the insight!
d
Anytime
🙌 1