Hi all, I wanted to raise a new (minor) feature fo...
# dev-metaflow
u
Hi all, I wanted to raise a new (minor) feature for discussion that I think could have a lot of utility (especially for Metaflow operators/admins). Here's a link to the issue, but in a nutshell: what if Metaflow took advantage of the ability for Batch to propagate tags from jobs to the actual ECS tasks in order to inject Metaflow metadata like run_id, flow_name, etc, as resource tags on the ECS tasks that actually run the job? This would be beneficial for monitoring systems, and also for cost attribution tools.
4
s
Makes sense. I have opened a PR to address this - https://github.com/Netflix/metaflow/pull/631
This PR would track -
metaflow.run_id
,
metaflow.flow_name
,
metaflow.step_name
,
metaflow.version
and
metaflow.user
. Since you can use any tag as namespace, we need to think through how to surface metaflow tags as resource tags.
👍 1
u
That's a good point. When I proposed it
namespace
originally, I guess I was thinking of the production token for SFN executions.
u
(since in our setup. we set that to the branch name and frequently use it as a results namespace)
s
I can add branch name directly as well.
a
The release is in flight - you should have the package available within the next hour - https://github.com/Netflix/metaflow/actions/runs/1080541216
🚤 3
w
I'm not sure that this behavior will actually provide per-job cost allocation when using EC2 (vs. Fargate). We (AWS Batch) need to do work here to make EC2 cost allocation better.
👍 2
u
@worried-machine-92008 Yes, this unfortunately a sore point. Having the tags lets you (if you enable the tags as cost allocation tags) get tagged metering data in Cost Explorer (vCPU Hours and GB Hours for cpu and memory, respectively); there's still a lot of machinations required by the end user to convert that back into an accurate-ish chargeback based on the instances utilized.
w
We are working on it, is all i can say at this point 🙂 Also Fargate can do per-job cost tracking but to my knowledge this is still not supported in MetaFlow?
💯 1
u
I believe Fargate compute environments actually are supported by Metaflow, though it won't work well in our case since the flows we're most interested in tracking cost for have resource requirements well in excess of what Fargate allows 😞
👍 1