Would anyone like to join me in writing an updated...
# dev-metaflow
l
Would anyone like to join me in writing an updated AWS CDK construct library for Metaflow? I've started already but it's still very early. More details in thread... https://github.com/phitoduck/mlops-club
āœ… 1
šŸ‘ 2
šŸ¤— 6
@flaky-plumber-70709 I know you wrote this project last year. Would you be open to PRs to update yours to AWS CDK v2? We could do a restart. I'm wanting to do some customizations to our metaflow deployment at work. We're using CDK v2, so I've been writing my own constructs. Here are some thoughts. Some of these may come across as over-opinionated, fighting words. These are just my opinions and I respect anyone who disagrees. • Hypothesis: I think the project would have more contributors if it were Python-native. Personally I'm decent at TypeScript, but much stronger at Python. I'm assuming that's the case with most of the developers in this community. I know you sacrifice the ability to export IaC to all the CDK supported languages when you don't write it in TypeScript. I'm open to doing TypeScript if it would get more support. We could do a survey of potential contributors and ask their preference. • Claim: If you're deploying to AWS (not kubernetes), CDK: ā—¦ (a) CDK is the most flexible, user-friendly, way to write AWS infrastructure these days. There seems to be an army of developers at AWS constantly adding features since it's AWS's officially supported tool. In commit frequence, it outpaces Pulumi and is focused exclusively on AWS. Having infrastructure as actual code is... really nice. If you need any customizations to your Metaflow setup to integrate it with your organization, my team has experienced CDK to be the quickest way to do so. (Aside: Much respect to terraform for it's module system) ā—¦ (b) Second to raw CloudFormation, CDK makes the least assumptions about your environment. With terraform, you may need to figure out how to manage your state and set up deploy locking. With Pulumi, you pretty much have to use the vendor offering. CDK uses CloudFormation as the deployment mechanism which is free and doesn't require prior setup. • Design ideas: I think the components of a Metaflow IaC library should be as loosely coupled as possible, so that individual pieces can be entirely replaced in different ways. ā—¦ You may want to run flows on my own on-premise GPU machines, but use other components (like RDS) in AWS. I think you can achieve that with ECS-anywhere. This could be great for companies/hobbyists/research labs with their own hardware. ā—¦ You may want to protect the UI in different ways: I could use AWS Cognito to put a login page in front of the metaflow UI. Or I might want to use a different auth provider like Active Directory or Auth0. Or maybe I'd skip auth altogether and just put these things into an existing VPC set up with a VPN. ā—¦ Save lots of money by hosting the SQL database on something that isn't RDS, and the containers on something that isn't ECS. For example, if you were motivated, you could put all of those things on a single AWS Lightsail instance for $10/mo. I could see a research lab doing this... and me just for fun. ā—¦ Straightforwardly modify the AWS Batch settings to use a private PyPI server. My company definitely has private python pacakges with utilities used during training and batch serving. Having flexibility would make this easier to set up. ā—¦ Mess with networking and IAM permissions if needed to give Metaflow runs access to protected services like a feature store, tracking server (thinking of MLFlow and maybe Optuna), a data warehouse, etc. In short, I think there are a lot of good reasons that hobbyists, research labs, and businesses might want to customize their Metaflow deployment using AWS CDK. If folks agree, I'd love to collaborate on this. Maybe we could even get it to the point that it moves under the official metaflow umbrella šŸ¤ž(I appreciate that it's hard to maintain IaC modules when there are a zillion IaC frameworks out there, so that may not work out).
šŸ‘ 1
a
we've deployed Metaflow internally within our company using CDK v2, happy to collaborate.
yayfox 1
l
That would be great! What are your thoughts on using Python as the language for the constructs? Or would you want to use TypeScript?
a
My understanding from a few users of CDK, Typescript is easier to debug. However, I’m only familiar with Python and have been using that as the CDK implementation
f
sorry for the delayed response here would love to reboot and collaborate - I think typescript should be the target because it allows you create packages for other languages using jsii and the tooling for this is great using the projen lib. my old timey project created go/js/python libraries and pushed to pypi/npm: https://github.com/bcgalvin/cdk-metaflow/blob/main/.projenrc.js#L86
lately I've been using cdktf to wrap the existing terraform modules but think a pure cdk implementation would be great
l
@flaky-plumber-70709 ! Thanks for responding. Yeah, if people like you would rather use TS, I’m good with that. Would you want to do this as a major version change to your current PyPI package?
I imagine a good amount might get reworked as we transition from CDK v1 to v2.
I could take a closer look to understand your current implementation. This could be a good opportunity to revisit methods to loosely couple the components
r
Hello ! I'm sorry popping up in this thread but I was wondering if you had some news or advices about this topic: I'm having an infra written in cdk v2 and want to add a metaflow construct and reused inside some resources we have already defined (like a vpc). what would be my best options knowing i don't have time yet to create a cdk v2 for it? Should I create a different stack with terraflow and hardcode my re-used values or do you recommend cdktf? Thank you ! if you have a cdk v2 in progress and need help i can always try to ask for time! šŸ™‚ Thank you!
l
Man, sorry for never responding to this! šŸ˜• You probably solved your problem, but yes, I would have recommended using their pre-made Terraform or CloudFormation template and then referencing those resources in the rest of your CDK app.
r
Oh no worry, im doing it on my side and will share when get time! thank you !