Hello, I had a question about how `parallel_map` b...
# ask-metaflow
p
Hello, I had a question about how
parallel_map
behaves to see if my understanding of what I am seeing is correct, and ask if there are any workarounds. It appears to me that when passing a large iterable (much larger than the number of available cores) to
parallel_map
the processes are run in batches rather than streamed. For clarity here are my definitions of batching and streaming: • batching: run
n
processes, wait for all to complete completion, and then run the next
n
processes • streaming: run
n
processes and run the next
1
process as resources become available I think
parallel_map
is using batching because I see all
n
processes begin, then see a slow reduction in the number of active processes, and then eventually a flare back up to
n
processes to repeat the loop. I am working with extracting spatial data over property boundaries with very variable sizes. With the batching behavior a single large property can effectively block computations on all
n
cores if it takes significantly longer than all the other properties in its batch. Streaming (like metaflow does for individual workers across resources) would mean less work trying to balance batches. I currently shard a large set of properties into smaller sets that can each be sent to a worker on AWS Batch. Within these sets of properties it is most efficient to process on individual properties and polygons, but because I am using Batch I do not want each property to be a task that ends up in the batch job queue. So far I have tried or considered: • Splitting properties into equal sized units (workable but this is introducing complications elsewhere in my flow) • Moving to
Pool().map
, but I don’t know if that has the same batching behavior (I am more familiar with multiprocessing in R, where streaming is the norm in the
furrr
package) • Doing something wild like splitting this step into its own flow and calling it via subprocess within each Batch worker to use the available cores for individual tasks within a worker (effectively set max-workers in the parent flow to control number of Batch workers, and then max-workers in the child flow could use all resources available) Would appreciate any tips or other approaches to parallelize this step that will be a bit more efficient at using the available cores.