The human-centric platform for production ML & AI

Outerbounds

Currently we are processing huge files of size ~100-150 GBs on k8s using metaflow. But this involves copying data from S3 to the EC2 instance where pod is running. I saw the <https://outerbounds.com/blog/metaflow-fast-data/|fast data> blog but this requires that we should have big enough memory to hold all of this data in memory. Is there something intermediate between downloading all of data inside pod vs loading everything inside memory? I The only idea I can think of is to split such files in smaller chunk and then load it in memory?