Hello, I stumbled upon metaflow today and I was wo...
# ask-metaflow
m
Hello, I stumbled upon metaflow today and I was wondering if it supports heterogeneous clusters ? The use case goes as following: Let's say you want to process videos with a deep learning model on gpus. Decoding videos requires a lot of compute, and currently is the cheapest on arm cores with instances like c7g on aws. Deep learning model requires gpus to run. Using gpu instance only to both decode and process the video often lead to underprovisionning of cpu compute. You can dramatically reduce the cost by decoding videos on dedicated cpu instances and transfer the decoded video through a socket to gpu instances to be processed. This in turns means that you need to setup a heterogenous cluster with a hybrid workflow, one for cpu instance and one for gpu instance. Is this scenario supported by metaflow ?
1
v
that's a fun scenario! I can quickly imagine three ways how you could it with Metaflow. 1. Two flows: a.
MainDecodingFlow
runs on CPUs. It gets everything ready for decoding (you could use
foreach
or
parallel
for distributed decoding), and then uses event triggering to launch the TrainingFlow, and pauses all tasks until the receiving end is ready (easy to do with sockets). b.
TrainingFlow
runs on GPUs - it's event-triggered by
MainDecodingFlow
and it waits for incoming video stream. Once the stream starts flowing, it can spin up training on GPUs. 2. Leverage Metaflow support for ephemeral compute clusters with interconnected nodes, but you'd need to wait for support for heterogeneous clusters which is on the roadmap. 3. Leverage Metaflow's support for various clouds, including GPU clouds like Nebius, which allows you to find nodes outside AWS with a suitable balance of CPUs and GPUs so you can just pack everything in a big box (maybe leveraging GPU-accelerated decoding too)
m
> Leverage Metaflow's support for various clouds, > including GPU clouds like Nebius, which allows you to find nodes > outside AWS with a suitable balance of CPUs and GPUs so you can just > pack everything in a big box (maybe leveraging GPU-accelerated decoding > too) The problem with this is that cpu compute on gpu boxes is always pricier than cpu only instances, especially arm instances. And the speed up of gpu decoding is not enough to make it cheaper than cpu decoding
you'd need to wait for support for heterogeneous clusters which is on the roadmap.
Is there a way I can track the progress for this features, so that I can get a notif when it is ready ?
v
right, you'd have to do the math about the total cost. A single big box avoids any networking overhead, spin up overhead etc., so while the cost/CPU-second may be higher, the total cost of getting the job done could be lower - much depends on the GPU type too, and the nature of your training.
m
So, for our type of training, we already did the math and we regularly run heterogenous clusters with a custom terraform based implementation. Using cluster placement on aws makes the networking overhead almost negligible and the cost reduction of heterogenous cluster is very significant.
v
meanwhile you can try the first approach if you are curious - it should work
m
Thank you !
> meanwhile you can try the first approach if you are curious - it should work Do you know if I can make sure that the two flow are in the same cluster placement in aws ?
v
if you use
@batch
, create two compute environments with the same placement group. I am not sure if you can have CPU and GPU instances in the same placement group though but you can try even without a placement group if you stay within an AZ, you should be able to get 10-50Gbps, which is pretty decent for uncompressed video, unless you have multiple HD streams being decoded in parallel
if you use EKS, you can set up node groups accordingly
feel free to follow up here if you need help 🙂
m
thank you for the tips, I'll try it when I have the time 🙂
🙌 1