colossal-tent-96436
11/06/2025, 2:55 PM@timeout decorator combining with @catch . According to the documentation:
It will cause the step to be retried if needed and the exception will be caught by the @catch decorator, if present.
But that is not behaviour I'm observing. When the timeout is reached the whole process is aborted and the pipeline fails.shy-refrigerator-15055
11/06/2025, 2:52 PM2.19.5 , mainly for the conditional step transitions, but now run into an issue that makes absolutely no sense to us. Details in 🧵, thanks in advance.fast-vr-44972
11/06/2025, 11:34 AMadamant-eye-92351
11/05/2025, 11:03 AM@conda and @pypi decorators. In our case, every flow that is run on AWS Batch, whether it's deployed or not, will use the custom Docker image, which creates two main pain points:
• When adding new dependencies, the inability to run a flow with batch without first having updated the pyproject.toml on the main branch (so that the CI/CD triggers itself and the Docker image is re built, including freshly added packages)
• The obvious bottlenecks it will create as more projects are created, depending on larger and @conda or @pypi decorators, is that correct?
◦ Is there an official alternative if for some reason someone would like to stick with poetry or uv?
• If it's not yet possible, I was wondering what is everyone's opinion on eventually having an official decorator, let's say @uv(pyproject="pyproject.toml") or @uv(lock="uv.lock"), that would allow to specify the dependencies at the flow level while allowing the creation of the venv at runtime.
◦ Similar to, from the doc about netflix's metaflow extensions, A more full-fledged environment command allowing you to resolve environments using external requirements.txt or environment.yml files as well as inspect and rehydrate environments used in any previously run step.?
In any case, happy to hear your thoughts and recommendations on that topic, thanks a lot in advance!
PS: the spin feature looks amazing 🤩witty-ability-67045
11/05/2025, 12:37 AMboundless-sugar-55740
11/04/2025, 2:05 PM@schedule to our modern Argo Workflows cluster.
The core problem is a schema incompatibility in the generated Argo CronWorkflow manifest.
Key Issue: Singular vs. Plural Fields
1. Metaflow Output: Metaflow is generating the CronWorkflow using the deprecated singular field: spec.schedule: '*/15 * * * *' (a string).
2. Argo Controller Requirement: Our Argo Controller (v3.6+) *r*equires the current plural field: spec.schedule*s*: ['*/15 * * * *'] (a list).
3. The Failure: As a result, the Argo Controller sees an empty list (schedules: []) and throws the error: "cron workflow must have at least one schedule".billions-memory-41337
11/03/2025, 7:30 PMearly-nest-89176
11/03/2025, 12:03 PMfast-honey-9693
10/31/2025, 12:34 AMhost_volumes is working properly.
i've got an AWS setup, using fargate. the recent change is that i'm using a custom container. the batch job is able to launch the ec2 instance, and the task starts, but for some reason the mount isn't passed through to the container properly.
more details and things in thread.chilly-cat-54871
10/30/2025, 1:58 PMquick-carpet-67110
10/30/2025, 1:27 PMmetaflow_metadata_service Docker image the same in ECR (under the Outerbounds org) and in DockerHub (under the netflix-oss org) or are there differences?
https://gallery.ecr.aws/outerbounds/metaflow_metadata_service
https://hub.docker.com/r/netflixoss/metaflow_metadata_service/tagshappy-journalist-26770
10/30/2025, 10:00 AMgorgeous-florist-65298
10/29/2025, 5:02 PMabundant-wolf-81413
10/28/2025, 9:16 PM@pypi_base(
packages={
"dalex": "1.7.2",
...
},
python="3.12.6",
)
But when Metaflow bootstraps the virtual environment, I get:
ERROR: Could not find a version that satisfies the requirement dalex==1.7.2
(from versions: 0.1.0, 0.1.2, ..., 0.2.0)
It seems that Micromamba / Metaflow is only seeing the old dalex versions (0.1.x → 0.2.0), even though dalex 1.7.2 exists on PyPI and installs fine outside Metaflow.
For example, if I run locally:
pip install dalex==1.7.2
It works perfectly, but inside Metaflow’s Micromamba environment it fails.
It looks like Metaflow 2.12.5 with Python 3.12 + Micromamba cannot find the dalex 1.7.x wheel!!plain-carpenter-99052
10/28/2025, 5:43 PM@batch(memory=512, cpu=1, gpu=1) the job associated remains in RUNNABLE state indefinitely :
• I have double-checked that compute ressources are sufficient
• I have an ECS Cluster with a g4dn.xlarge instance attached to it and launched by the Auto Scalling Group that Metaflow or Batch launched.
• I use the AMI ami-02124cf261ef1e336 so I have cuda installed
My CloudTrails shows a RunTask event that fails with response element :
"responseElements": {
"tasks": [],
"failures": [
{
"arn": "arn:aws:ecs:eu-west-3:*******:container-instance/62da090445a44320811473cd2c0e4055",
"reason": "RESOURCE:GPU"
}
It can't understand why, it's like my Batch couldn't see the GPU ressource attached to my EC2 and there is no log to help...
When I launch the same step without gpu=1 it works perfectly well.abundant-quill-72601
10/28/2025, 4:35 PMquick-carpet-67110
10/28/2025, 8:38 AMgoogle-api-core release yesterday is not compatible with Metaflow since we have started seeing massive failure of all our pipelines this morning with this message:
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] File "/usr/local/bin/download-gcp-object", line 5, in <module>
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] from simple_gcp_object_downloader.download_gcp_object import main
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] File "/usr/local/lib/python3.12/site-packages/simple_gcp_object_downloader/download_gcp_object.py", line 2, in <module>
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] from google.cloud import storage
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] File "/usr/local/lib/python3.12/site-packages/google/cloud/storage/__init__.py", line 35, in <module>
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] from google.cloud.storage.batch import Batch
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] File "/usr/local/lib/python3.12/site-packages/google/cloud/storage/batch.py", line 43, in <module>
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] from google.cloud import exceptions
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] File "/usr/local/lib/python3.12/site-packages/google/cloud/exceptions/__init__.py", line 24, in <module>
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] from google.api_core import exceptions
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] File "/usr/local/lib/python3.12/site-packages/google/api_core/__init__.py", line 20, in <module>
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] from google.api_core import _python_package_support
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] File "/usr/local/lib/python3.12/site-packages/google/api_core/_python_package_support.py", line 28, in <module>
[2025-10-28, 07:01:50 UTC] {pod_manager.py:471} INFO - [base] from packaging.version import parse as parse_version
[2025-10-28, 07:02:00 UTC] {pod_manager.py:471} INFO - [base] ModuleNotFoundError: No module named 'packaging'
The failure is coming from this line. I've looked through the metaflow codebase and it looks like none of the GCP dependencies here are pinned to specific versions so they are probably pulling in the latest google-api-core versionfuture-crowd-14830
10/24/2025, 2:38 PMchilly-cat-54871
10/23/2025, 4:03 PMjoin? Is it absolutely required, or is there some type of no-op I can specify to avoid loading a million artifacts from the datastore if I don't need to? Am I missing something / thinking about this incorrectly?cool-notebook-79020
10/23/2025, 10:59 AMTimeout: loading cards
Most of the times it works after refreshing the page a few times but not always.
Running on AWS ECSlively-lunch-9285
10/21/2025, 7:37 PMabundant-byte-82093
10/20/2025, 11:54 AMdef building_block_echo(input: str) -> None:
print(f"echo... {(input)}.")
class MyEchoPipeline(FlowSpec):
test = Parameter(
"test",
help="A test parameter to show mypy typing works",
default=1,
type=int)
@step
def start(self):
print(type(self.test))
building_block_echo(self.test)
self.next(self.end)
@step
def end(self):
pass
if __name__ == "__main__":
MyEchoPipeline()hundreds-rainbow-67050
10/20/2025, 3:56 AMfuture-crowd-14830
10/16/2025, 5:44 PM--package-suffixes and I'm also using a uv environment via --environment=uv. Since these arguments come before the metaflow commands (run, show, etc.), it doesn't appear that I can set these in a configuration some way. Is there some way to set these via a configuration or even hard code them into the flow script? I tried argument injection in __main__ but it didn't work.billions-memory-41337
10/16/2025, 2:17 AMquick-carpet-67110
10/15/2025, 9:17 AMpython transformed_metaflow_pipelines/some_metaflow_file.py --environment=pypi --branch something airflow create --generate-new-token pipelines/some_metaflow_file.py
Error:
2025-10-13 11:32:58.300 Bootstrapping virtual environment(s) ...
Micromamba ran into an error while setting up environment:
command '/home/runner/.metaflowconfig/micromamba/bin/micromamba create --yes --no-deps --download-only --safety-checks=disabled --no-extra-safety-checks --repodata-ttl=86400 --prefix=/tmp/tmpixz_p440/prefix --quiet
(omitting a bunch of packagenames that get dumped to stacktrace)
returned error (1)
critical libmamba Unable to read repo solv file 'conda-forge/noarch', error was: unexpected EOF, depth = 3
Unfortunately, this error does not happen everytime this command is run and thus far we have not been able to pin down the exact conditions when this happens, but wondering if someone else has seen this before.delightful-actor-70552
10/14/2025, 1:39 PMdelightful-zebra-65925
10/13/2025, 1:24 PMfast-vr-44972
10/13/2025, 10:14 AMnarrow-waitress-79414
10/10/2025, 6:29 PM