cbi_toolbox.parallel

Submodules

Module contents

The parallel package provides tools to split parallel computations.

cbi_toolbox.parallel.distribute_bin(dimension, rank, workers)[source]

Computes the start index and bin size to evenly split array-like data into multiple bins.

Parameters:
  • dimension (int) – The size of the array to distribute.

  • rank (int, optional) – The rank of the worker.

  • workers (int, optional) – The total number of workers.

Returns:

The start index of this bin, and its size. The distributed data should be array[start:start + bin_size].

Return type:

(int, int)

cbi_toolbox.parallel.distribute_bin_all(dimension, workers)[source]

Computes the start indexes and bin sizes of all splits to distribute computations across multiple workers.

Parameters:
  • dimension (int) – the size of the array to be distributed

  • workers (int, optional) – the amount of workers

Returns:

The list of start indexes and the list of bin sizes to distribute data.

Return type:

([int], [int])

cbi_toolbox.parallel.parallelize(func, size, workers=None)[source]

Launches a function multiple times in parallel using multithreading. Useful only if the GIL is released in the parallelized function (this is the case for many numpy and scipy routines).

Parameters:
  • func (function (callable)) – The function that will be run in parallel. It must take 2 arguments, which are the returns of distribute_bin_all corresponding to the thread pool (the list of starting indexes of data bins, and the list of bin sizes).

  • size (int) – The size of the array that will be split between workers.

  • workers (int, optional) – The maximum number of workers, by default None (will be maximized for the system).

Returns:

An iterator containing the results of the function calls, in a random order (see concurrent.futures.ThreadPoolExecutor.map).

Return type:

iterator