So I managed to schedule a few flows using AWS ste...
# ask-metaflow
h
So I managed to schedule a few flows using AWS steps and AWS batch for compute, which is great. Let us say I schedule a flow daily at 1am UTC and I would like to consecutively run other flows, depending on when the scheduled flow has completed - its run time may fluctuate let us between 1-2 hours? Can this be achieved with just AWS and MetaFlow (don’t think I can pester DevOps to also get Argo up nor can I currently face the learning curve). Ideally I would like to keep the flows separated as well. Any feedback very much appreciated please. Thanks! PS: Just looking at this: https://docs.metaflow.org/production/event-triggering/flow-events which may be the solution for my use case?
1
r
@happy-wolf-7852 I think the
trigger_on_finish
decorator can work for your situation. I was going to suggest that documentation as a starting point.
That's similar to what we do. First flow is scheduled, and then the flows that follow are all using
trigger_on_finish
for the flow that they depend on in front of them. This allows us to make sure the flows finish first, and that the run is available (which we are checking in the start step)
most of our infra is on GCP, but we switched Metaflow to AWS because it seemed more straight forward using the AWS services.
h
Thanks but having reread this, does this not require Argo?
Copy code
python firstflow.py argo-workflows create
python secondflow.py argo-workflows create
r
I'll have to circle back with the team to double check. I might be misspeaking, but I remember we had moved Metaflow to AWS specifically to not have to use Argo, but maybe I missed something.
h
Thanks @rich-toothbrush-6560 yes that’s what I am doing - I schedule flows as step functions (e.g. daily at 1am or every N hours). However, I struggle somewhat with establishing dependencies without having to assume that scheduled flow 1 always completes before 1am + 35 minutes, which may work for now … so ideally I want to be able to trigger consecutive flows on the back of a completed “aws step function scheduled flow” …
r
yeah totally makes sense. Our flows are highly variable with times as well, plus passing data between flows, so we definitely need it to complete as well, otherwise we are just repeating a previous run 😅
h
I take it you must use Argo?
r
I'll circle back with the team. I could have swore they removed Argo, but it wouldn't be the first time I was wrong this week.
👍 1
@happy-wolf-7852 yeah confirmed, I guess they moved Argo over as well. But it should be possible as of recently. This looks interesting: https://docs.aws.amazon.com/step-functions/latest/dg/sample-start-workflow.html
👍 1
h
As a workaround you can trigger the other SFNs directly via
boto3
🙌 1
👍 1
r
@hundreds-rainbow-67050 was there a limitation to implementing this? Maybe a good feature to pick up?
h
@hundreds-rainbow-67050 thanks - any code snippets you could point me to please?
h
I don't have the historical context, but this is also a feature that my team has wanted in the past
1
I don't have access to the actual code anymore, but it was something like the following. IIRC you also have to grant permissions to the Batch role to trigger SFN (and describe-executions if you want to wait for the run to complete)
Copy code
import boto3
import json

def trigger_sfn(state_machine_arn, input_data):
    sfn_client = boto3.client('stepfunctions')
    input_json = json.dumps(input_data)
    params = {
        'stateMachineArn': state_machine_arn,
        'input': input_json
    }
    return sfn_client.start_execution(**params)

if __name__ == "__main__":
    state_machine_arn = 'arn:aws:states:us-east-1:123456789012:stateMachine:YourStateMachineName'
    input_data = {
        'Parameters': {
            'key1': 'value1',
            'key2': 'value2'
        }
    }

    response = trigger_sfn(state_machine_arn, input_data)
h
So from what I understand, the flows have all to be deployed to/as AWS step function but the dependent one won’t have a schedule decorator? I may start there and see. Thanks!
h
yes that's the idea
👍 1
s
@happy-wolf-7852 - Nissan's pattern is now a little bit more easier with the Deployer API
you can embed the deployer method in your end step and it will take care of invoking the downstream flow for you
@brainy-truck-72938 might have a few examples handy
❤️ 1
👀 1
h
I have scheduled everything at certain times for now as I use AWS batch and AFIK stuff is only executed sequentially (like a FIFO) in our current setup things may actually “synchronise”. I will let it run over night and have a look tomorrow 🙃.
h
@ancient-application-36103 good point. I guess Runner could also work? When should someone use Runner vs Deployer?
b
I can try to come up with an example that uses the
Deployer API
within the
end
step
@hundreds-rainbow-67050 Deployer is for
deploying
a flow to a production grade orchestrator such as
Argo Workflows
,
Step Functions
and triggering the flows which are deployed there... Runner is for running a flow (that still might target a mix of compute platforms such as
kubernetes
,
batch
) -- but the flow is not run via an explicit orchestrator i.e. via Argo, Step functions
❤️ 1