future-crowd-14830
10/16/2025, 5:44 PM--package-suffixes
and I'm also using a uv environment via --environment=uv
. Since these arguments come before the metaflow commands (run, show, etc.), it doesn't appear that I can set these in a configuration some way. Is there some way to set these via a configuration or even hard code them into the flow script? I tried argument injection in __main__
but it didn't work.billions-memory-41337
10/16/2025, 2:17 AMquick-carpet-67110
10/15/2025, 9:17 AMpython transformed_metaflow_pipelines/some_metaflow_file.py --environment=pypi --branch something airflow create --generate-new-token pipelines/some_metaflow_file.py
Error:
2025-10-13 11:32:58.300 Bootstrapping virtual environment(s) ...
Micromamba ran into an error while setting up environment:
command '/home/runner/.metaflowconfig/micromamba/bin/micromamba create --yes --no-deps --download-only --safety-checks=disabled --no-extra-safety-checks --repodata-ttl=86400 --prefix=/tmp/tmpixz_p440/prefix --quiet
(omitting a bunch of packagenames that get dumped to stacktrace)
returned error (1)
critical libmamba Unable to read repo solv file 'conda-forge/noarch', error was: unexpected EOF, depth = 3
Unfortunately, this error does not happen everytime this command is run and thus far we have not been able to pin down the exact conditions when this happens, but wondering if someone else has seen this before.delightful-actor-70552
10/14/2025, 1:39 PMdelightful-zebra-65925
10/13/2025, 1:24 PMfast-vr-44972
10/13/2025, 10:14 AMnarrow-waitress-79414
10/10/2025, 6:29 PMstale-ambulance-25084
10/09/2025, 5:45 PMrich-agent-87730
10/08/2025, 1:48 PMConfig
and config_expr
mechanisms (which is pretty cool) to load in all of the data for my decorators. There is a pain point where it is difficult or hard to resume a failed flow with a fix in the config file.
One example is for the @batch
decorator, If I get a OOM error, I can just up the amount of memory in the configs but if I resume the flow, since the old config has been cached, that gets pulled instead of the fixed new value. I currently have to edit the flow and manually update the batch decorator with the new values.
Is there a better way to do this?happy-journalist-26770
10/08/2025, 1:35 PM.netrc
file or set up Git config for GitHub auth - need to install a package from a private GitHub repo using uv.silly-megabyte-67326
10/07/2025, 11:15 PMstraight-shampoo-11124
10/07/2025, 7:39 PMabundant-quill-72601
10/03/2025, 8:41 PMpriorityClassName
• Tried adding a default priority class for argo WF, it is also not supported
◦ Argo Workflows supports the podPriorityClassName
field, but unless the upstream tool (Metaflow) exposes this in its generated Argo manifests, there is no clean injection route
• Tried adding METAFLOW_KUBERNETES_POD_SPEC_OVERRIDE
, this works for direct k8s jobs, but we are running through argoWF, my test failed...
• Also adding priorityClassName
to a WorkflowTemplate (that we can create) using a direct field is not natively available in current Metaflow-Argo integrations...
• Also, of course, users cannot add pod specs in their metaflow jobs. Not supported yet
Let me know your recommendations, thanks a bunch!!!alert-needle-14247
10/03/2025, 3:07 PM"StatusReason": "Container Overrides length must be at most 8192"
I’m happy to see that this has been fixed in version 2.18.3, using the command:
Deployer(flow, environment="pypi").step_functions().create(max_workers=5, compress_state_machine=True)
But now I get the following error:
1759499526663,Downloading code package...
1759499526675," File ""<string>"", line 1"
1759499526675," import boto3, os; ep=os.getenv(\""METAFLOW_S3_ENDPOINT_URL\""); boto3.client(\""s3\"", **({\""endpoint_url\"":ep} if ep else {})).download_file(\""***\"", \""***/data/c3/c365cb66d0989afafb2a3ec7ee7bcb01504793a1\"", \""job.tar\"")"
1759499526675, ^
1759499526675,SyntaxError: unexpected character after line continuation character
1759499536683,/tmp/step_command.sh: line 1: 2025-10-03T13:52:16.681827Z: command not found
1759499536683,/tmp/step_command.sh: line 1: task: command not found
1759499536684,/tmp/step_command.sh: line 1: 0: command not found
1759499536684,/tmp/step_command.sh: line 1: 2025-10-03T13:52:16.681827065+00:00]Failed: command not found
1759499536684,Failed to download code package from s3://***/c3/c365cb66d0989afafb2a3ec7ee7bcb01504793a1 after 6 tries. Exiting...
I replace the S3 bucketname with *, but both bucketname and key are correct.
Does anyone know why this is happening?millions-barista-45672
10/03/2025, 5:15 AMgreat-egg-84692
10/02/2025, 9:50 PMcrooked-camera-86023
10/01/2025, 7:10 PM`Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50m default-scheduler Successfully assigned default/postgresql-0 to minikube
Normal Pulling 49m (x4 over 50m) kubelet Pulling image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>"
Warning Failed 49m (x4 over 50m) kubelet Failed to pull image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>": Error response from daemon: manifest for bitnami/postgresql:15.3.0-debian-11-r7 not found: manifest unknown: manifest unknown
Warning Failed 49m (x4 over 50m) kubelet Error: ErrImagePull
Warning Failed 48m (x6 over 50m) kubelet Error: ImagePullBackOff
Normal BackOff 45m (x18 over 50m) kubelet Back-off pulling image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>"
Normal SandboxChanged 43m (x2 over 43m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning Failed 42m (x3 over 43m) kubelet Error: ErrImagePull
Normal BackOff 42m (x6 over 43m) kubelet Back-off pulling image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>"
Warning Failed 42m (x6 over 43m) kubelet Error: ImagePullBackOff
Normal Pulling 41m (x4 over 43m) kubelet Pulling image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>"
Warning Failed 41m (x4 over 43m) kubelet Failed to pull image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>": Error response from daemon: manifest for bitnami/postgresql:15.3.0-debian-11-r7 not found: manifest unknown: manifest unknown
Normal SandboxChanged 35m (x3 over 35m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning Failed 35m (x3 over 35m) kubelet Failed to pull image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>": Error response from daemon: manifest for bitnami/postgresql:15.3.0-debian-11-r7 not found: manifest unknown: manifest unknown
Warning Failed 35m (x3 over 35m) kubelet Error: ErrImagePull
Warning Failed 34m (x6 over 35m) kubelet Error: ImagePullBackOff
Normal Pulling 34m (x4 over 35m) kubelet Pulling image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>"
Normal BackOff 10m (x109 over 35m) kubelet Back-off pulling image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>"
Normal SandboxChanged 5m59s (x2 over 6m4s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning Failed 5m15s (x3 over 5m59s) kubelet Error: ErrImagePull
Warning Failed 4m36s (x6 over 5m59s) kubelet Error: ImagePullBackOff
Normal Pulling 4m23s (x4 over 6m4s) kubelet Pulling image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>"
Warning Failed 4m22s (x4 over 5m59s) kubelet Failed to pull image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>": Error response from daemon: manifest for bitnami/postgresql:15.3.0-debian-11-r7 not found: manifest unknown: manifest unknown
Normal BackOff 52s (x22 over 5m59s) kubelet Back-off pulling image "<http://docker.io/bitnami/postgresql:15.3.0-debian-11-r7|docker.io/bitnami/postgresql:15.3.0-debian-11-r7>"
future-crowd-14830
10/01/2025, 5:24 PMTransient S3 failure (attempt #1) -- total success: 18, last attempt 18/20 -- remaining: 2
Transient S3 failure (attempt #2) -- total success: 18, last attempt 0/2 -- remaining: 2
Transient S3 failure (attempt #3) -- total success: 18, last attempt 0/2 -- remaining: 2
Transient S3 failure (attempt #4) -- total success: 18, last attempt 0/2 -- remaining: 2
few-dress-69520
10/01/2025, 4:53 PM--with kubernetes
and I can deploy it as an argo-workflow and trigger it without problems.
When I install the metaflow-netflixext, I can still run it locally, however when I try to run it remotely in a container I get the error that some 20 packages were not found in the cache. All of these look like they are required for Metaflow itself, not for the specific flow I'm running.
E.g.
'ld_impl_linux-64-2.44-h1423503_1': not found at packages/conda/conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.44-h1423503_1.conda/ld_impl_linux-64-2.44-h1423503_1.conda/0be7c6e070c19105f966d3758448d018/ld_impl_linux-64-2.44-h1423503_1.conda
'libgomp-15.1.0-h767d61c_4': not found at packages/conda/conda.anaconda.org/conda-forge/linux-64/libgomp-15.1.0-h767d61c_4.conda/libgomp-15.1.0-h767d61c_4.conda/3baf8976c96134738bba224e9ef6b1e5/libgomp-15.1.0-h767d61c_4.conda
...
'uv-0.7.8-h2f11bb8_0': not found at packages/conda/conda.anaconda.org/conda-forge/linux-64/uv-0.7.8-h2f11bb8_0.conda/uv-0.7.8-h2f11bb8_0.conda/aff01745ebc7e711904866ee2e762a42/uv-0.7.8-h2f11bb8_0.conda
When I look at our datastore location on S3, this is true. They packages are indeed not there. Why are they not there and what do I have to do to make them available?future-crowd-14830
10/01/2025, 3:40 PMsome-nail-13772
10/01/2025, 9:02 AM1 FailedCreatePodSandBox: Failed to create pod sandbox: failed to construct FQDN from pod hostname and cluster domain, FQDN js-5b374c0-control-0-0.js-5b374c0.xxxxxxxxxxxxxxx.svc.cluster.local is too long (64 characters is the max, 74 characters requested)
1. is there any way to fix it without changing our namespace?
2. If I set this set_hostname_as_fqdn
to False, is there any impact?echoing-camera-27293
10/01/2025, 8:09 AMfilename
2. I have to load the filename
and process him (calculate the len for instance)
Than I created the following flow :
1. Loading DF -> iterating over the filename
columns to the next step with id
as input
2. Load the file using the self.input
, compute the length, store it in self.len
as the id in self.id
3. Converge to the next step that must reconcile the length iterating on inputs self.df.loc[input.id, 'len'] = self.len
However, the number of filesto process is huge, and I'm thinking to put the process in cloud. But the files are only on the host ... What's the best way to proceed ?agreeable-ambulance-71794
10/01/2025, 6:38 AMbland-garden-80695
09/30/2025, 11:37 PMacoustic-river-26222
09/30/2025, 9:59 PMMETAFLOW_SFN_DYNAMO_DB_TABLE
. I created a flow and deployed in aws step functions, however, i dont see any item in dynamo. Every flow has run normal. I share an image of the table and the step functions used to run a simple flow.happy-journalist-26770
09/30/2025, 6:01 AMforeach
) when running on k8s via argo-workflows.
The cli one works fine:
uv run example_flow.py --branch dev --environment=uv --with retry argo-workflows create --max-workers 2
but the env/config.json is not not working:
# /home/sln/.metaflowconfig/config.json:
METAFLOW_ARGO_WORKFLOWS_CREATE_MAX_WORKERS=2
# Tries these aswell:
METAFLOW_ARGO_WORKFLOWS_MAX_WORKERS
METAFLOW_ARGO_CREATE_MAX_WORKERS
hundreds-rainbow-67050
09/29/2025, 4:41 AMbland-fountain-92046
09/27/2025, 11:17 PMcrooked-camera-86023
09/27/2025, 6:22 PMnutritious-magazine-38839
09/26/2025, 9:35 AMSandboxProvisioning
), @square-wire-39606 @limited-tomato-18674 could you please check if something global? I tried with two of my email addresses, so I doubt it's specific to my account. Would love to get it working for a demo I'm doing today 2:30pm (CET), but I understand if it's not doable (due to late night in the USA) and I'm working on a plan B