https://outerbounds.com/ logo
Join Slack
Powered by
  • l

    lively-lunch-9285

    06/10/2025, 11:45 PM
    You know, the
    python <flow>.py ...
    cli has a lot going on. There are a lot of arguments and options you can pass in. We could totally make a VS Code extension that turns some of it into a push-button affair: • killing kubernetes pods • resuming (and from which step? 🤔 ) • running • choosing your environment • launching a debug session that stops if you set a breakpoint I'm just saying, there are a TON of subcommands. And I've been guilty of asking repeat questions on these, as well as fielding tons of questions about it all.
    0
    a
    • 2
    • 9
  • c

    chilly-france-99853

    06/10/2025, 6:47 AM
    Hey good people, has anyone used API Gateway (with Lambda) to trigger a metaflow workflow? Due to nature of the read-only filesystem of lambda, I was facing a wall right now. If not, are there any other alternatives to get it triggered as an endpoint? Thanks in advance!
    0
    h
    s
    f
    • 4
    • 7
  • s

    shy-midnight-40599

    06/06/2025, 2:44 PM
    Hi, I am using "step-functions delete" command to delete the stepfunctions and other resources created as part of "step-functions create" command. Its deleting only the Step function and Event bridge rule associated with it. But not deleting the job definitions. Is there any reason behind this? We have a bunch of job definitions which are not used by any step functions and figuring out a way to delete those automatically.
    ✅ 1
    0
    s
    • 2
    • 2
  • f

    fast-vr-44972

    06/06/2025, 12:42 PM
    Hi, how do you delete a flow completely from metaflow + argo workflow? Is it just removing the template? or something else?
    ✅ 1
    0
    s
    • 2
    • 3
  • g

    gentle-author-38571

    06/06/2025, 6:40 AM
    Hi , is it normal for Metaflow to create local datastore in ECS container (via aws batch) when MF_DATASTORE=s3 is set to s3 ? Getting below error Creating local datastore in current directory (/opt/my_app/portfolio_calcs/.metaflow) Internal error Traceback (most recent call last): File "/opt/my_app/metaflow/metaflow/cli.py", line 554, in main start(auto_envvar_prefix="METAFLOW", obj=state) . . . . File "/opt/my_app/metaflow/metaflow/datastore/content_addressed_store.py", line 140, in load_blobs with open(file_path, "rb") as f: TypeError: expected str, bytes or os.PathLike object, not NoneType
    ✅ 1
    0
    a
    • 2
    • 1
  • q

    quick-carpet-67110

    06/05/2025, 11:37 AM
    Hey everyone! We are trying to orchestrate our Metaflow pipelines with AirFlow/Cloud Composer (running on GCP), but when we upload the generated AirFlow DAG into our DAGs folder, we get this parsing error: Does this look familiar to anyone? We are using this command to generate the pipeline:
    Copy code
    python flow.py --environment=pypi --with retry airflow create flow_dag.py
    Copy code
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/python3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 139, in _multicall
        raise exception.with_traceback(exception.__traceback__)
      File "/opt/python3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103, in _multicall
        res = hook_impl.function(*args)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/etc/airflow/config/airflow_local_settings.py", line 62, in pod_mutation_hook
        and any(env_var.name == "AIRFLOW_IS_K8S_EXECUTOR_POD" for env_var in container.env)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/etc/airflow/config/airflow_local_settings.py", line 62, in <genexpr>
        and any(env_var.name == "AIRFLOW_IS_K8S_EXECUTOR_POD" for env_var in container.env)
                ^^^^^^^^^^^^
    AttributeError: 'dict' object has no attribute 'name'
    👀 1
    0
    t
    a
    • 3
    • 7
  • b

    bland-helicopter-46902

    05/31/2025, 1:06 PM
    Are there any strategies for mixing a Metaflow flow with bits of native Step Functions or Airflow Operators ?
    0
  • c

    colossal-scooter-60260

    05/31/2025, 2:27 AM
    Hi, first post here! I am learning about how to deploy metaflow on GCP (my end goal is to learn about K8s, ML orchestration frameworks and distributed training from a lower level angle). I am essentially trying to deploy the template here one step at a time, and it's great! I was wondering what the purpose of each service is. Initially my understanding was: • Metadata service -> When an instance of a flow is submitted as a run, I understood this service is responsible for saving the code snapshot and data artefacts to the datastore (eg. S3) and stores metadata on the run itself (eg. name, status, artefacts and locations etc) in a SQL instance. I also assumed the UI sends requests to this service to render a frontend with the info on runs • Metaflow static ui service -> The Metaflow UI React frontend has been previously built, and this service hosts the static frontend assets using React's API (eg. using npm start or something similar) • Metaflow ui backend service -> Given the above, I am unsure what this is used for. Why is there a UI backend service and a metadata service? Would anyone be able to explain the purpose of each of these services? I couldn't figure it out from the docs why there is a specific UI backend service. Here is my template in case anyone is interested. Thanks so much!
    ✅ 1
    0
    d
    b
    • 3
    • 6
  • b

    bulky-portugal-95315

    05/29/2025, 7:00 PM
    Hi all, we have our metaflow UI running on ECS today and we’ve been hitting some odd dropouts on connections. this is one of the error logs that we encounter on the UI logs itself. Has anyone else seen this before?
    ✅ 1
    0
    a
    • 2
    • 3
  • l

    lively-lunch-9285

    05/29/2025, 3:02 PM
    I remember hearing Metaflow is experimenting with an OpenLineage integration. Any updates on that? It looks like you can publish custom lineage events to AWS DataZone. It would be so sick to have a Metaflow example that uses • Athena to create some tables in AWS glue catalog • A Metaflow flow that uses Athena and/or duckdb to query the glue catalog to write to a new table with batch inference And then register the Metaflow flow diwnstream of the incoming tables and upstream of the inference table. https://aws.amazon.com/blogs/big-data/amazon-datazone-introduces-openlineage-compatible-data-lineage-visualization-in-preview/
    ✅ 1
    0
    v
    • 2
    • 3
  • s

    stale-grass-96920

    05/29/2025, 12:27 AM
    Hi all! I'm setting up a Metaflow/Argo-workflow stack on our Kubernetes cluster. I'm using Minio for the Datatools and Datastore and am seeing a credentials error when the Workflow is triggered (following the 'Episode 8: Autopilot' tutorial). The error is
    Copy code
    botocore.exceptions.NoCredentialsError: Unable to locate credentials
    • I'm running this to execute:
    python 02-statistics/stats.py argo-workflows create --max-workers 2
    • I have my ~/.metaflowconfig/config.json set with METAFLOW_DATASTORE_SYSROOT_S3, METAFLOW_DATATOOLS_S3ROOT, METAFLOW_DEFAULT_DATASTORE (is there a • IMO Argo Workflow is configured correctly because it can read/write to the Minio Bucket. I'm looking through the metaflow service code and see the plugins/aws code uses boto but I don't see anyway obvious way to pass Minio's aws_access_key_id and aws_secret_access_key. Any guidance is appreciated... just point me in the right direction. I feel like I'm missing something obvious.
    ✅ 1
    0
    s
    • 2
    • 4
  • h

    hundreds-wire-22547

    05/27/2025, 8:20 PM
    Hi, is it possible to modify the labels that are applied to sensor generated from an
    ArgoEvent
    ? Looking at underlying code, I didn't see an env var or something similar to do this but maybe I missed it
    👍 1
    ✅ 1
    0
    s
    • 2
    • 2
  • m

    millions-church-9220

    05/27/2025, 9:57 AM
    Hello! When I use metaflow2.9.9, this error often appears. Does anyone know why? How to debug this problem?
    0
    a
    • 2
    • 3
  • d

    dry-angle-21635

    05/27/2025, 8:20 AM
    Hey, I'm trying to implement conditional branching to give the option on the infrastructure used for computation (see code snippet below). I've seen conditional branching is not supported by metaflow, but do you have alternatives for this usecase ? Thank you !
    Copy code
    @step
        def evaluate_regression(self) -> None:
            if self.infra:
                self.next(self.evaluate_on_kubernetes)
            else:
                self.next(self.evaluate_locally)
    
        @kubernetes(
            gpu=1,
            cpu=4,
            memory=16_000,  # 16Gb
            namespace="X",
        )
        @step
        def evaluate_on_kubernetes(self) -> None:
            ...    
            self.next(self.join_target_results)
    
        @step
        def evaluate_locally(self) -> None:
            ...
            self.next(self.join_target_results)
    ✅ 1
    0
    f
    d
    • 3
    • 5
  • m

    millions-church-9220

    05/27/2025, 7:08 AM
    Hello! When I use metaflow2.9.9, this error often appears. Does anyone know why? And it only appears occasionally
    0
    d
    • 2
    • 2
  • t

    thousands-rocket-55304

    05/26/2025, 11:27 PM
    Hello! I recently applied for a position at Metaflow and was wondering if someone could connect me with someone involved in the hiring process—I’d really appreciate any feedback you might be able to share. Thanks in advance!
    0
    d
    a
    • 3
    • 5
  • n

    nutritious-coat-36638

    05/26/2025, 3:59 PM
    Hi guys! I'm new to Metaflow, so apologies if this is a silly question, but I'm curious about dynamic compute pool selection in flows. I work on a project called CarbonAware, which enables carbon aware scheduling in workflow orchestrators (the goal being to run workloads in times and places where energy is the greenest). I'm currently investigating building an integration for Metaflow (to go along with existing time shift integrations for Prefect and Airflow). My question is -- is it possible to dynamically select a compute pool, at flow runtime in Metaflow? Stealing the example code from the cross cloud example, would it be reasonable to implement something like the following?
    Copy code
    from metaflow import FlowSpec, step, resources
    from carbonaware_metaflow import green_kubernetes
    import urllib
    
    class GreenCrossCloudFlow(FlowSpec):
    
        @step
        @kubernetes
        def start(self):
            req = urllib.request.Request('<https://raw.githubusercontent.com/dominictarr/random-name/master/first-names.txt>')
            with urllib.request.urlopen(req) as response:
                data = response.read()
            i = 0
            self.titles = data[:10]
            self.next(self.process, foreach='titles')
    
        @resources(cpu=1,memory=512)
        @green_kubernetes(node_selectors=[
            ("<http://outerbounds.co/provider=azure|outerbounds.co/provider=azure>", "<http://outerbounds.co/region=us-central1|outerbounds.co/region=us-central1>"),
            ("<http://outerbounds.co/provider=aws|outerbounds.co/provider=aws>", "<http://outerbounds.co/region=us-west-2|outerbounds.co/region=us-west-2>"),
        ]
        @step
        def process(self):
            self.title = '%s processed' % self.input
            self.next(self.join)
    
        @step
        def join(self, inputs):
            self.results = [input.title for input in inputs]
            self.next(self.end)
    
        @step
        def end(self):
            print('\n'.join(self.results))
    
    if __name__ == '__main__':
        GreenCrossCloudFlow()
    Thanks in advance for the help! Looking forward to working with this community :)
    0
    v
    n
    • 3
    • 10
  • p

    purple-air-87768

    05/22/2025, 11:57 PM
    Hey Metaflow folks, i am wondering if we have some thoughts about data/type validation for metaflow's data serialization. currently artifacts are pickled and unpickled from/to remote storage like s3 and i noticed that there is no explicit data type or schema validation performed before pickling data or after unpickling it from storage. that said, there could be potential issues like silent data corruption, security risks (unpickling arbitrary data can execute malicious code if compromised), and lack of guarantees for downstream steps. i am thinking we could implement a StepDecorator that dynamically checks the types of variables against an expected schema. e.g.,
    Copy code
    @validate(input_schema={var: schema}, output_schema={var: schema})
    @step
    def step_func()
    the decorator can hook the step function and perform validation at the start and end of the step function (after unpickling input vars and before pickling output vars). i wonder if this is something interesting to Metaflow, or if there are more native way of doing validation as soon as the serialization happens. thanks!
    0
    v
    • 2
    • 4
  • e

    enough-article-90757

    05/22/2025, 4:18 PM
    Hey! I was wondering how Metaflow handles package imports across steps. We're using a separate custom image for each step, where each image has a separate set of packages. Specifically, Metaflow is returning an error saying that it can't find a module from the previous step, even though the erroring step doesn't attempt to import that module. For example, consider these flow steps:
    Copy code
    @kubernetes(
            image='a_image_name'
        )
        @step
        def a(self):
            from some_package_for_a import a_logic
    
            self.output = a_logic()
    
            self.next(self.b)
    
        @kubernetes
        @step
        def b(self, inputs):
            print([input for input in inputs])
    I'd expect that to finish without errors, but instead we get:
    Copy code
    2025-05-22 16:02:22.552 [11/b/60 (pid 179819)] ModuleNotFoundError: No module named 'some_package_for_a'
    Why is this the case?
    ✅ 1
    0
    d
    • 2
    • 7
  • b

    billowy-agency-44736

    05/22/2025, 7:48 AM
    Hi all 🙂 TL;DR - dag and stdout are not shown on my deployment, even though bucket si accessible . ---- • I deployed metaflow on EKS. I'm using aurora postgres and Amazon S3 • I can see my workflows on the UI, the issue I have is that I cannot see the DAG, nor stdout. • I cannot find any helpful error log, • I can confirm that I can list all resources in the S3 bucket from the metaflow-ui pod (though strangely there was no boto3 on that python env) --- Will appreciate any help
    0
    d
    t
    • 3
    • 3
  • i

    important-bear-42262

    05/22/2025, 7:23 AM
    Hello everyone, I have this function:
    Copy code
    def load_model(model_name):
            with S3(run=Flow('MyFlow').latest_successful_run) as s3:
                response = s3.get(model_name)
                model_bytes = io.BytesIO(response.blob)
                model = joblib.load(model_bytes)
            return model
    It is trying to store file
    metaflow.s3.ee54mdiw
    when I call it. Is it possible to change path where it is going to store it?
    0
    d
    • 2
    • 5
  • g

    gifted-shampoo-74550

    05/20/2025, 8:19 PM
    Hey all! Is anyone aware of any API reference docs for the CLI on the metaflow website? I'm having a hard time finding anything. E.g., I just want to know everything I get from
    python myflow.py --help
    (and all the subcommands recursively under that). I was actually curious for the sake of using the
    Runner
    API, but the docs just say things like "Additional arguments that you would pass to
    python myflow.py
    after the
    run
    command" 🙂; further, in some cases these appear to be positional args for the CLI but must be kwargs for the
    Runner
    methods. E.g., specifically I was looking for
    runner.resume(step_to_rerun="step_name")
    , which, to its credit does match the CLI
    --help
    docs, but feels a bit surprising.
    0
    a
    d
    • 3
    • 4
  • i

    important-london-94970

    05/20/2025, 6:41 PM
    Hey everyone I have more of a design/best practices question. I’m building a classification pipeline using a large model from huggingface. I’m receiving a list of files to process, storing that in self and then launching several jobs using foreach with the goal of speeding up the job. Should I use the @huggingface_hub to download the model in a previous step, and the use @model to load the model in each parallel step? Or is it better to load the model and store it in self then just call self.model.predict() in the parallel steps?
    0
    v
    h
    • 3
    • 2
  • f

    full-kilobyte-32033

    05/20/2025, 5:49 PM
    I'm hoping it's an error in our setup/KI. Does it sound familiar to anyone?
    0
  • f

    full-kilobyte-32033

    05/20/2025, 5:48 PM
    Hey folks, we have a small test environment running with minikube and we're seeing errors like these after a few runs:
    Copy code
    Metadata request (/flows/FlowyFlow) failed (code 500): "{\"err_msg\": {\"pgerror\": \"ERROR:  relation \\\"flows_v3\\\" does not exist\\nLINE 4:             FROM flows_v3\\n                         ^\\n\", \"pgcode\": \"42P01\", \"diag\": {\"message_primary\": \"relation \\\"flows_v3\\\" does not exist\", \"severity\": \"ERROR\"}}}"
    ✅ 1
    0
    s
    • 2
    • 8
  • a

    adamant-psychiatrist-48924

    05/20/2025, 8:15 AM
    Hi everyone, I've read previously that it is not possible to have multiple schedules for the same flow. Is the recommended way of doing this using the Config object with multiple config files? If so, the cron would still be unique to the project, branch and flow name. I understand that it is not possible to have multiple cron for the same flow under the same branch under the same project. Are there plans to support this behaviour in the future?
    0
    h
    • 2
    • 4
  • w

    white-helicopter-28706

    05/20/2025, 2:28 AM
    [Minor] -> Missing
    @pypi
    in the table of contents of Metaflow docs
    0
  • c

    clever-arm-75024

    05/19/2025, 10:35 PM
    Hello, I noticed the podcasts are not playable on the Outerbounds webpage. I also noticed that the mp3 URLs for a couple of them are returning 404, so I figured I would let you guys know. Thanks a lot for the MetaFlow project. I'm mindblown by the great tool you guys have made.
    ✅ 1
    0
    v
    • 2
    • 1
  • r

    rich-agent-87730

    05/19/2025, 5:01 PM
    Hello, I suspect that since this PR in introduction connection pooling to the metadata service, we’ve been getting
    Copy code
    ValueError: Metaflow service [<http://metaflow/metadata/service.com>] unreachable.
    Do you guys know why this is happening? I’m guessing it’s because the service is closing the connection for some reason? Are there any fixes that we can do?
    ✅ 1
    0
    a
    • 2
    • 9
  • c

    cuddly-rocket-69327

    05/17/2025, 12:25 AM
    Does Metaflow have a way to run a function like an Argo exit_handler ? Like a finally in exception handling?
    ✅ 1
    0
    v
    a
    • 3
    • 3