Hey team :wave: I've made a PR to add custom aws b...
# dev-metaflow
b
Hey team 👋 I've made a PR to add custom aws batch tagging to metaflow could you please have a look - we need this for doing cost attribution across teams. https://github.com/Netflix/metaflow/pull/2144 Thanks @square-wire-39606
among us party 1
friendly bump
s
thanks rikish. cc @bulky-afternoon-92433 for review
b
Hey @bulky-afternoon-92433 , thank you for reviewing thankyou . I've changed the "tags" variable to "aws_tags" so its more consistent now. As for the emitting a warning for scenarios of trying to emit tags but
METAFLOW_BATCH_EMIT_TAGS
is not set or is set to false - would you have any examples of how I should setup warnings in this repo, just so I can keep the pattern the same.
👌 1
t
excellent. I'll take a look later today. On the warnings side, I think we can proceed without them. Even though the ux could be nice, at a second glance it seems to introduce unnecessary complexity to the mix.
finished my review. still some changes to be done before its ready, and I noted that I wasn't able to run the tests at all in the current state of the PR without some changes. Open to chat about the changes needed if theres anything unclear about them as well 🙂
b
Hey @bulky-afternoon-92433, quick question - you suggested to merge the
BATCH_DEFAULT_TAGS
environment variable into the decorator aws tags. I don't want to go down this route as we want the
BATCH_DEFAULT_TAGS
to be seperate and tag even when the batch decorator is not used. We want the batch_decorator to be the last applied tags option. I'm trying to look for a place to move the validation of the
BATCH_DEFAULT_TAGS
out to, what would be the best place to move it out so that it can validate before the batch job is created and still remain seperate from decorator logic. I'm thinking to validate inside
batch_cli.py
t
this should still be doable inside the batch decorator even for that use case. Whenever you are running on batch, there is a batch decorator present (either explicit or implicit). Using step-functions for example, the step functions CLI attaches a batch decorator under the hood if missing. This would then use the environment variable as expected.
added a new round of review comments and a stacked PR on top of yours. Can you see if the changes make sense or if they fail to cover a use case you had in mind?
b
Hey @bulky-afternoon-92433 I tried with your changes but we still need to be able to tag from different entries. Currently running into a weird situation. I'm running this flow:
Copy code
from metaflow import (
    FlowSpec,
    step,
    batch,
    retry,
    schedule,
    project,
    conda,
    current
)
import logging
from common import (
    logging_setup
)

logging_setup.configure()


custom_step_tags = {
    "goodbye": "world",
    "hello": "universe",
    "batch_decorator_tag": "True",
    #"inv@lid": "t@g"
}

custom_step_tags_v2 = {
    "something": "else",
    "batch_decorator_tag": "True"
}

@project(name="ml_apac_trisolaris")
@schedule(hourly=True)
class CanarySimpleAdhoc(FlowSpec):
    @step
    def start(self):
        self.logger = logging.getLogger(self.__class__.__name__)
        <http://self.logger.info|self.logger.info>(f"Canary Hello")
        self.next(self.hello)

    @batch(cpu=1, memory=500, aws_batch_tags=custom_step_tags)
    @retry
    @step
    def hello(self):
        <http://self.logger.info|self.logger.info>("Canary Hello World in Batch!")
        self.next(self.goodbye)

    @batch(cpu=1, memory=500, aws_batch_tags=custom_step_tags_v2)
    @retry
    @step
    def goodbye(self):
        <http://self.logger.info|self.logger.info>("Canary Goodbye World in Batch!")
        self.next(self.end)


    @step
    def end(self):
        <http://self.logger.info|self.logger.info>("HelloAWS is finished.")


if __name__ == "__main__":
    CanarySimpleAdhoc()
But it seems the start step is receiving all the decorators at once - Is this expected? Heres the logs from the batch job for start step:
Copy code
2025-03-03T22:10:44.773Z
AWS Batch error:
2
2025-03-03T22:10:44.773Z
aws_batch_tags is not a dictionary must be Dict[str, str]
3
2025-03-03T22:10:44.773Z
Recieved BATCH_DEFAULT_TAGS as: {}
4
2025-03-03T22:10:44.714Z
Recieved BATCH_DEFAULT_TAGS as: {}
5
2025-03-03T22:10:44.714Z
Recieved Decorator AWS Tags Dictionary as: {'something': 'else', 'batch_decorator_tag': 'True'}
6
2025-03-03T22:10:44.713Z
Recieved BATCH_DEFAULT_TAGS as: {}
7
2025-03-03T22:10:44.713Z
Recieved Decorator AWS Tags Dictionary as: {'goodbye': 'world', 'hello': 'universe', 'batch_decorator_tag': 'True'}
8
2025-03-03T22:10:41.471Z
Task is starting.
9
2025-03-03T22:10:41.421Z
Code package downloaded.
10
2025-03-03T22:10:40.335Z
Downloading code package...
11
2025-03-03T22:10:34.682Z
Setting up task environment.
Also it seems the environment variable for
METAFLOW_BATCH_DEFAULT_TAGS
does not get carried over when deploying to step functions - its not present in the job definition and in the logs its coming through as the default
{}
, is there another place that needs updating.
t
hmm, no the start step shouldn't be receiving more than its own decorators. I'll take a look at this in the morning as its quite late for me right now. for step functions you might need to add the default tags env var to be passed in
step_functions.py
where the overall state-machine setup is done
👍 1
b
Of course no worries, whenever you get time. Ill take a look into the step functions env var.
b
Before I get to any testing, it seems that my stacked PR which was merged last week was maybe lost in a rebase or something? I don't see any of my changes in your branch
b
Ah whoops messed up the rebase, I couldn't use your full changes as I still need the step functions cli tagging option but I think I still need the
batch_decorator.py
change from it. I can fix the rebase tomorrow morning.
👍 1
Hey, I've tested again and got it working by splitting the
aws_batch_tags
in the decorator into another variable in there called
aws_batch_tags_list
- without this split it seemed like there was a loop somewhere and already processed tags were being sent through twice which was causing issues. PR still needs cleanup and some variable names fixes like you suggested before with changing
aws_batch_tags
to
aws_tags
in batch related files. Once you are happy with the setup I will clean things up. Another thing to note was that I commented out the recommendation you made for batch_decorator in runtime_init - seems this is not necessary but if you think this could cause issues with different setup I'm happy to run more tests and discuss. From my testing it does look like there is a difference between what happens in batch from a deployed step function vs what happens when deploying - the BATCH_DEFAULT_TAGS variable is correctly assigned when deploying and shows up in the correct place in the temporary logging I have, but when it runs on batch from a step function trigger - what is supposed to log in the BATCH_DEFAULT_TAGS temporary logging instead gets logged through the aws_batch_tags temporary logging. Not too sure what is causing this mixup.
Hey @bulky-afternoon-92433 , whenever you have time could you please have a look at this again. Thankyou
👍 1
t
Heya, sorry its taken a while to get back to this. I'll have a look over the weekend / next monday and test things out. Just to verify, as I noticed that it was still in the PR; Do you have a use case in mind that requires the CLI option for step-functions that allows settings batch tags as well? Wondering if we can drop this or not, as it complicates the setup quite a bit and as I see it, the environment variable serves the same purpose already
b
Hey no worries whenever you get some free time. We do prefer to have the cli option for the step function create command - we have a large number of teams deploying via the same job / script which generates the create setup so this is the easiest way for us to enforce correct team tagging for cost attribution across all deploys. Relying on env vars entry point to enforce this is not as nice for us but possible.
c
Would love this feature as well so ➕ from my side. Do I understand this feature correct saying we would be able to add AWS tags on all our flows that can later on be filtered in aws cost explorer?
b
Do I understand this feature correct saying we would be able to add AWS tags on all our flows that can later on be filtered in aws cost explorer?
Yes, adds tags to the batch jobs created by flows which can be filtered for cost attribution across teams
🙌 1
s
This is what we have been searching for. If we could get this feature released, it would be awesome!!
b
Hey @bulky-afternoon-92433, when you get time could you please review - I've updated the variable names
t
had a quick look today and seems its currently not working as intended (?) I'll share my test setup later today once I'm back home, so you'll have an easier time testing it out as well. can we agree on the target functionality though, so is this the complete set we're aiming for: 1. CLI options for step-functions
Copy code
python flow.py step-functions create --aws-batch-tag some=tag --aws-batch-tag another=tag
2. Defaults through env var / metaflow config
Copy code
METAFLOW_BATCH_DEFAULT_TAGS='{"tag1":"value1","tag2":"value2"}' python flow.py run --with batch
3. Step specific tags with
@batch
decorator
Copy code
@batch(aws_batch_tags = {"tag1": "val1", "tag2": "val2"})
and order of merging tags is
env < cli < decorator
this is a rough draft, so
pytest tag_test.py
should make it easier to validate the functionality. As I noted, right now the branch fails to execute with these though. I'll add the step-functions coverage later on
b
Thankyou, will take a look today
Hey @bulky-afternoon-92433 - tested on step functions it looks good to me I can see the tags come through. You do need to set the env var
METAFLOW_BATCH_EMIT_TAGS
to
true
as its still under that