I have been thinking about how Metaflow should lev...
# dev-metaflow
s
I have been thinking about how Metaflow should leverage infrastructure as code (Cloudformation/Terraform) provisioning the required infrastructure for it to run as well as if Metaflow should provision additional infrastructure on demand such as compute infrastructure. Thoughts?
n
as in, Metaflow would create Batch clusters and whatnot?
s
yes, something along those lines but hard to figure out by myself which ones would make sense if any at all. Maybe a tradeoff between setting the whole infra up front vs adding component afterwards as new flows demand it
n
Yea serverless and cortex do that for you too. It is really nice for getting started but at the same time gets more and more awkward if infrastructure gets complex, at which point you're at risk of (poorly) reimplementing terraform / cloudformation UX in your product
s
I see, probably for the best to focus on using the infrastructure effectively and leaving the infra setup up to each user with their preferred tool of choice
n
i think it can be done elegantly as long as you don't need to touch any networking (VPCs, security groups and whatnot)
when it comes to networking, there is too much stuff that most users would want to configure themselves at least to some degree
s
one I have been thinking about is IAM roles along with an IAM decorator @iam could really simplify AWS permissions story which can be tricky for many data scientists/machine learning engineers and could be a good idea that Metaflow has strong opinions with dealing with IAM roles(creation and assuming roles with Lambda/Batch/etc)
πŸ‘ 2
v
I like the idea of
@iam
/
@assume_role
decorator! I've actually hacked a decorator like that in the past for a particular use case
πŸ‘ 1
a
@victorious-lawyer-58417 @strong-flag-58501 here you go - https://gist.github.com/savingoyal/e122c60cbb949cff1fcd914ec56715a8
πŸ‘ 1
😎 1
v
great, thanks for finding it!
@strong-flag-58501 I don’t know if this is exactly what you had in mind but at least this helps to switch between roles
πŸ‘ 1
s
exactly what I meant yeah! using it along with namespaces would be brilliant
v
interesting. What do you have in mind for namespaces?
s
something along multi account and multi region flows using both iam decorator and namespaces/users but hard to say specific use cases without trying them both first
u
At 23andMe, we have often taken the approach of building this role delegation into our data service clients. One thing to consider is that in organizations with strict security requirements, there may be somewhat strict controls over how IAM policies/roles are managed. I think helping users to use roles is easier to solve for than creating them... I am going to add to @straight-shampoo-11124's gist an enhancement that we often use, which is auto-refresh of the temporary credentials; since batch is on ECS, and ECS uses role-chaining to provide task role creds to the container, any temporary credentials you provide on top of that cannot have a duration exceeding 1 hour. This workaround uses little-documented feature in botocore/boto3 to manage the session for you and grab new creds before that hour is up, useful if you have longer-running steps.
πŸ™Œ 2
among us party 2
u
But re: metaflow having a role in infra creation via terraform, I was telling @square-wire-39606 earlier: we actually have created an internal project at 23andMe called launchpad that provides a cli for bootstrapping new Metaflow projects, and part of what it solves for includes creating resources that we often treat as project-specific (e.g. a new Batch CE, the associated job role, etc). I think that kind of thing could be easily rolled into Metaflow and still have plenty of utility, but without taking on the complexity of VPCs, networking, everything else that is super org specific and likely not to be under the full control of either ML infra or data science teams.