Hello I had a question about how `parallel map` behaves to s Outerbounds #ask-metaflow

Hello, I had a question about how `parallel_map` b...

polite-mechanic-76958

03/23/2023, 7:47 PM

Hello, I had a question about how

parallel_map

behaves to see if my understanding of what I am seeing is correct, and ask if there are any workarounds. It appears to me that when passing a large iterable (much larger than the number of available cores) to

parallel_map

the processes are run in batches rather than streamed. For clarity here are my definitions of batching and streaming: • batching: run

processes, wait for all to complete completion, and then run the next

processes • streaming: run

processes and run the next

process as resources become available I think

parallel_map

is using batching because I see all

processes begin, then see a slow reduction in the number of active processes, and then eventually a flare back up to

processes to repeat the loop. I am working with extracting spatial data over property boundaries with very variable sizes. With the batching behavior a single large property can effectively block computations on all

cores if it takes significantly longer than all the other properties in its batch. Streaming (like metaflow does for individual workers across resources) would mean less work trying to balance batches. I currently shard a large set of properties into smaller sets that can each be sent to a worker on AWS Batch. Within these sets of properties it is most efficient to process on individual properties and polygons, but because I am using Batch I do not want each property to be a task that ends up in the batch job queue. So far I have tried or considered: • Splitting properties into equal sized units (workable but this is introducing complications elsewhere in my flow) • Moving to

Pool().map

, but I don’t know if that has the same batching behavior (I am more familiar with multiprocessing in R, where streaming is the norm in the

furrr

package) • Doing something wild like splitting this step into its own flow and calling it via subprocess within each Batch worker to use the available cores for individual tasks within a worker (effectively set max-workers in the parent flow to control number of Batch workers, and then max-workers in the child flow could use all resources available) Would appreciate any tips or other approaches to parallelize this step that will be a bit more efficient at using the available cores.

12 Views

Open in Slack

Previous Next