most-analyst-45184
07/18/2025, 6:32 AMvictorious-lawyer-58417
07/18/2025, 8:49 AMMainDecodingFlow
runs on CPUs. It gets everything ready for decoding (you could use foreach
or parallel
for distributed decoding), and then uses event triggering to launch the TrainingFlow, and pauses all tasks until the receiving end is ready (easy to do with sockets).
b. TrainingFlow
runs on GPUs - it's event-triggered by MainDecodingFlow
and it waits for incoming video stream. Once the stream starts flowing, it can spin up training on GPUs.
2. Leverage Metaflow support for ephemeral compute clusters with interconnected nodes, but you'd need to wait for support for heterogeneous clusters which is on the roadmap.
3. Leverage Metaflow's support for various clouds, including GPU clouds like Nebius, which allows you to find nodes outside AWS with a suitable balance of CPUs and GPUs so you can just pack everything in a big box (maybe leveraging GPU-accelerated decoding too)most-analyst-45184
07/18/2025, 8:50 AMmost-analyst-45184
07/18/2025, 8:52 AMyou'd need to wait for support for heterogeneous clusters which is on the roadmap.Is there a way I can track the progress for this features, so that I can get a notif when it is ready ?
victorious-lawyer-58417
07/18/2025, 8:54 AMmost-analyst-45184
07/18/2025, 8:56 AMvictorious-lawyer-58417
07/18/2025, 8:59 AMvictorious-lawyer-58417
07/18/2025, 9:01 AMmost-analyst-45184
07/18/2025, 9:01 AMmost-analyst-45184
07/18/2025, 9:02 AMvictorious-lawyer-58417
07/18/2025, 9:12 AM@batch
, create two compute environments with the same placement group. I am not sure if you can have CPU and GPU instances in the same placement group though but you can try
even without a placement group if you stay within an AZ, you should be able to get 10-50Gbps, which is pretty decent for uncompressed video, unless you have multiple HD streams being decoded in parallelvictorious-lawyer-58417
07/18/2025, 9:12 AMvictorious-lawyer-58417
07/18/2025, 9:15 AMmost-analyst-45184
07/18/2025, 9:16 AM