Hey good people, has anyone used API Gateway (with...
# ask-metaflow
c
Hey good people, has anyone used API Gateway (with Lambda) to trigger a metaflow workflow? Due to nature of the read-only filesystem of lambda, I was facing a wall right now. If not, are there any other alternatives to get it triggered as an endpoint? Thanks in advance!
h
What error are you getting? You have the flow deployed somewhere and are trying to trigger it? Triggering shouldn't require writing to the filesystem
c
The flow is defined in the lambda code. The error I am getting is:
025-06-10T06:24:32 Metaflow 2.15.15 executing MovieOptimizerFlow for user:AWS_USER
2025-06-10T06:24:33 Project: movie_optimizer, Branch: user.AWS_USER
2025-06-10T06:24:33 Validating your flow...
2025-06-10T06:24:33 The graph looks good!
2025-06-10T06:24:33 Deploying movie_optimizer.user.AWS_USER.MovieOptimizerFlow to AWS Step Functions...
2025-06-10T06:24:34 The namespace of this production flow is
2025-06-10T06:24:34 production:mfprj-54xtlshdyrnf56uy-0-acfk
2025-06-10T06:24:34 To analyze results of this production flow add this line in your notebooks:
2025-06-10T06:24:34 namespace("production:mfprj-54xtlshdyrnf56uy-0-acfk")
2025-06-10T06:24:34 If you want to authorize other people to deploy new versions of this flow to AWS Step Functions, they need to call
2025-06-10T06:24:34 step-functions create --authorize mfprj-54xtlshdyrnf56uy-0-acfk
2025-06-10T06:24:34 when deploying this flow to AWS Step Functions for the first time.
2025-06-10T06:24:34 See "Organizing Results" at <https://docs.metaflow.org/> for more information about production tokens.
2025-06-10T06:24:34 Internal error
2025-06-10T06:24:34 Traceback (most recent call last):
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/cli.py", line 619, in main
2025-06-10T06:24:34 start(auto_envvar_prefix="METAFLOW", obj=state)
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/_vendor/click/core.py", line 829, in __call__
2025-06-10T06:24:34 return self.main(args, kwargs)
2025-06-10T06:24:34 ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/_vendor/click/core.py", line 782, in main
2025-06-10T06:24:34 rv = self.invoke(ctx)
2025-06-10T06:24:34 ^^^^^^^^^^^^^^^^
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/cli_components/utils.py", line 69, in invoke
2025-06-10T06:24:34 return _process_result(sub_ctx.command.invoke(sub_ctx))
2025-06-10T06:24:34 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/_vendor/click/core.py", line 1259, in invoke
2025-06-10T06:24:34 return _process_result(sub_ctx.command.invoke(sub_ctx))
2025-06-10T06:24:34 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/_vendor/click/core.py", line 1066, in invoke
2025-06-10T06:24:34 return ctx.invoke(self.callback, ctx.params)
2025-06-10T06:24:34 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/_vendor/click/core.py", line 610, in invoke
2025-06-10T06:24:34 return callback(args, kwargs)
2025-06-10T06:24:34 ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/_vendor/click/decorators.py", line 33, in new_func
2025-06-10T06:24:34 return f(get_current_context().obj, args, kwargs)
2025-06-10T06:24:34 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/plugins/aws/step_functions/step_functions_cli.py", line 183, in create
2025-06-10T06:24:34 token = resolve_token(
2025-06-10T06:24:34 ^^^^^^^^^^^^^^
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/plugins/aws/step_functions/step_functions_cli.py", line 460, in resolve_token
2025-06-10T06:24:34 store_token(token_prefix, token)
2025-06-10T06:24:34 File "/var/lang/lib/python3.11/site-packages/metaflow/plugins/aws/step_functions/production_token.py", line 71, in store_token
2025-06-10T06:24:34 with open(path, "w") as f:
2025-06-10T06:24:34 ^^^^^^^^^^^^^^^
2025-06-10T06:24:34 OSError: [Errno 30] Read-only file system: '/var/task/.metaflowconfig/mfprj-54xtlshdyrnf56uy'
Possible reason I can think of is in Lambda we can only write to /tmp
s
it seems that the flow is already deployed on step functions. you can simply use boto3 to make a call to step functions to trigger the execution from the lambda
f
@chilly-france-99853 another pattern that works well is to use ArgoEvent in the lambda code, I don't have any public examples to share but here is a demonstration of the pattern I've seen out in the wild using FastAPI (not my project but saw it was public, hopefully they don't mind the share): https://github.com/WGBH-MLA/chowda/blob/main/chowda/routers/dashboard.py#L13 So you would trigger your deployed flow by hitting the apigw, this way you can decouple metaflow client stuff from the caller. Useful when you want to trigger using a different language
oops sorry didn't see it was a step functions one - but ya what Savin said hah
h
Seems like it's trying to deploy it to SFN, if I'm reading the logs properly -- ie it's not already deployed
Copy code
2025-06-10T06:24:33 Deploying movie_optimizer.user.AWS_USER.MovieOptimizerFlow to AWS Step Functions...
f
yeah I jumped the gun (mind automatically jumps to kubernetes metaflow now), but this should be a blog post with a tf module / cdk construct available 🙂 . In my work I do it a lot to add metaflow into event-driven arch so it deserves a call out I think
c
@flaky-plumber-70709 no worries mate! @hundreds-rainbow-67050 @square-wire-39606 the flow is not deployed to step functions. At least, I do not see it deployed on the console. What I understand that its trying to write something to my fs as part of the deployment which ofcourse does not work in Lambda
f
so you actually can write to filesystem in Lambda a few different ways, you can mount a EFS , which doesn't make sense here OR you can write to the ephemeral /tmp, which would probably solve your issue here ( /tmp is configurable up to 10GB). I misunderstood your issue before, looks like you're deploying the step function within the lambda, which is not the pattern we were suggesting. Instead, deploy the step function wherever you normally work with metaflow, then trigger that flow from lambda using the
start_execution
method on the boto3 sfn client (https://docs.aws.amazon.com/code-library/latest/ug/python_3_sfn_code_examples.html)
@chilly-france-99853 you would need this lambda to have the iam permissions for this, but you would run into a similar issues that you hit before, namely: 1. metaflow would try to write to a read-only dir by default 2. your metaflow config json values are not present in the lambda What I've done in the past is to save the metaflow config json to parameter store and then write a helper function that retrieves those values, writes them to
/tmp/.metaflowconfig/config.json
and then sets
METAFLOW_HOME
to the
/tmp/.metaflowconfig/
dir. If you have the metaflow lib installed, you're good to go. Lambda powertools makes this a lot easier with their various helpers for params and api gw: https://docs.powertools.aws.dev/lambda/python/latest/utilities/parameters/ In my case I basically registered the helper as lambda powertools middleware (https://docs.powertools.aws.dev/lambda/python/latest/utilities/middleware_factory/) Putting that all together you can then have a reusable module / cdk construct if you handle the argo event / step function name as an environment variable. You're then left with a lambda function that has metaflow configured, ready to invoke sfn / argo workflows, that you can then put into apigw, event bridge, use as a dynamodb stream wherever
c
Solved, we used a similar workflow while building the container. Copied the metaflow config to /tmp and modified it to write to /tmp