Hi all, following up on a question I asked in the ...
# dev-metaflow
c
Hi all, following up on a question I asked in the Metaflow gitter.
Copy code
Hello, I am interested in overriding the default log driver for the Batch job definition so that we can ingest logs from the Metaflow Batch jobs running in ECS into Datadog as opposed to Cloudwatch logs. How would I go about overriding the default job definition which Metaflow creates on my behalf when sending tasks to Batch job queues?
1
Currently, we are using the default logdriver which is awslogs
But we are looking for the best way to ingest logs into Datadog from the individual Metaflow tasks orchestrated by Batch
cc: @ancient-application-36103
s
Let me do some digging and get back to you in a few minutes?
❤️ 1
c
Sounds great, thank you @ancient-application-36103
s
A simple way would be to set a bridge between cloudwatch logs and datadog to forward all Metaflow logs to datadog.
All logs for Metaflow AWS Batch jobs go into
/aws/batch/job
log group and the log streams are prefixed by
metaflow_
Using the client API you can get access to the cloudwatch log group and stream name for each Metaflow task as well -
Copy code
Run('MyFlow/42')['start'].task.metadata_dict['aws-batch-awslogs-stream']
Copy code
Run('MyFlow/4')['start'].task.metadata_dict['aws-batch-awslogs-group']
It seems the only way to forward Cloudwatch Logs to Datadog is via a
forwarder
lambda - https://docs.datadoghq.com/serverless/libraries_integrations/forwarder/
Also, Metaflow currently stores all the logs in S3 as well, but configuring the Datadog
forwarder
for S3 may be a bit more involved.
@cool-father-45885 Would something like this work or were you hoping for a more native integration where Metaflow forwards the logs to Datadog?
It should also be possible to run a custom docker image with a datadog agent that automatically forwards all the logs to datadog - I haven't tried it myself though.
c
A more native integration would certainly be ideal. Currently, it sounds like the log forwarding solution is our best option requiring the least amount of added maintenance.
👍 1
r
We were hoping for something more native but the lambda solution will work. Thank you @square-wire-39606! Follow up question, does Metaflow have a mechanism to add tags that will propagate to the underlying components of aws batch?
s
Very timely question! In our recent release, we have started to tag the underlying instances with metadata about the AWS Batch job - https://github.com/Netflix/metaflow/releases/tag/2.3.3
🙌 2
Would this suffice or were you interested in also forwarding custom user tags?
r
This looks great, we will try and get this setup. Thank you for all the help!
👍 1
s
I have also opened a Github issue - https://github.com/Netflix/metaflow/issues/643 to track Datadog integration. At minimum, we should document the lambda bridge pattern.
👍 2
@rapid-garden-26667 Great! Let us know any other feedback/questions you may have!
c
Thank you very much @ancient-application-36103
🙇 1
p
@rapid-garden-26667 what were you hoping to tag? the batch jobs or the ec2 instances (which have costs). tagging batch jobs will not show up in cost explorer...(at least with ec2 instances, maybe fargate)