cbi_toolbox.parallel.mpi

The mpi module allows to distribute operations in MPI communicators.

It is an optional feature of cbi_toolbox that requires a working implementation of MPI.

cbi_toolbox.parallel.mpi.compute_vector_extent(axis, array=None, shape=None, dtype=None)[source]

Compute the extent in bytes of a sliced view of a given array.

Parameters:
  • axis (int) – Axis on which the slices are taken.

  • array (numpy.ndarray, optional) – The array to slice, by default None (then shape and dtype must be given).

  • shape (the shape of the array to slice, optional) – The shape of the array, by default None.

  • dtype (numpy.dtype or str, optional) – The datatype of the array, by default None.

Returns:

The extent of the slices underlying data.

Return type:

int

Raises:

ValueError – If array, shape and dtype are all None.

cbi_toolbox.parallel.mpi.create_slice_view(axis, n_slices, array=None, shape=None, dtype=None)[source]

Create a MPI vector datatype to access given slices of a non distributed array. If the array is not provided, its shape and dtype must be specified.

Parameters:
  • axis (int) – The axis on which to slice.

  • n_slices (int) – How many contiguous slices to take.

  • array (numpy.ndarray, optional) – The array to slice, by default None (then shape and dtype must be given).

  • shape (the shape of the array to slice, optional) – The shape of the array, by default None.

  • dtype (numpy.dtype or str, optional) – The datatype of the array, by default None.

Returns:

The strided datatype allowing to access slices in the array.

Return type:

mpi4py.MPI.Datatype

Raises:

ValueError – If array, shape and dtype are all None.

cbi_toolbox.parallel.mpi.create_vector_type(src_axis, tgt_axis, array=None, shape=None, dtype=None, block_size=1)[source]

Create a MPI vector datatype to communicate a distributed array and split it along a different axis.

Parameters:
  • src_axis (int) – The original axis on which the array is distributed.

  • tgt_axis (int) – The axis on which the array is to be distributed.

  • array (numpy.ndarray, optional) – The array to slice, by default None (then shape and dtype must be given).

  • shape (the shape of the array to slice, optional) – The shape of the array, by default None.

  • dtype (numpy.dtype or str, optional) – The datatype of the array, by default None.

  • block_size (int, optional) – The size of the distributed bin, by default 1.

Returns:

The vector datatype used for transmission/reception of the data.

Return type:

mpi4py.MPI.Datatype

Raises:
  • ValueError – If array, shape and dtype are all None.

  • ValueError – If the source and destination axes are the same.

  • NotImplementedError – If the array has more than 4 axes (should work, but tests needed).

  • ValueError – If the block size is bigger than the source axis.

cbi_toolbox.parallel.mpi.distribute_mpi(dimension, mpi_comm=<mpi4py.MPI.Intracomm object>)[source]

Computes the start index and bin size to evenly split array-like data into multiple bins on an MPI communicator.

Parameters:
  • dimension (int) – The size of the array to distribute.

  • mpi_comm (mpi4py.MPI.Comm, optional) – The communicator, by default MPI.COMM_WORLD.

Returns:

The start index of this bin, and its size. The distributed data should be array[start:start + bin_size].

Return type:

(int, int)

cbi_toolbox.parallel.mpi.distribute_mpi_all(dimension, mpi_comm=<mpi4py.MPI.Intracomm object>)[source]

Computes the start indexes and bin sizes of all splits to distribute computations across an MPI communicator.

Parameters:
  • dimension (int) – the size of the array to be distributed

  • mpi_comm (mpi4py.MPI.Comm, optional) – the communicator, by default MPI.COMM_WORLD

Returns:

The list of start indexes and the list of bin sizes to distribute data.

Return type:

([int], [int])

cbi_toolbox.parallel.mpi.gather_full_shape(array, axis, mpi_comm=<mpi4py.MPI.Intracomm object>)[source]

Gather the full shape of an array distributed across an MPI communicator along a given axis.

Parameters:
  • array (numpy.ndarray) – The distributed array.

  • axis (int) – The axis on which the array is distributed.

  • mpi_comm (mpi4py.MPI.Comm, optional) – The communicator, by default MPI.COMM_WORLD.

Raises:

NotImplementedError – This is not implemented yet.

cbi_toolbox.parallel.mpi.get_rank(mpi_comm=<mpi4py.MPI.Intracomm object>)[source]

Get this process number in the communicator.

Parameters:

mpi_comm (mpi4py.MPI.Comm, optional) – The communicator, by default MPI.COMM_WORLD.

Returns:

The rank of the process.

Return type:

int

cbi_toolbox.parallel.mpi.get_size(mpi_comm=<mpi4py.MPI.Intracomm object>)[source]

Get the process count in the communicator.

Parameters:

mpi_comm (mpi4py.MPI.Comm, optional) – The MPI communicator, by default MPI.COMM_WORLD.

Returns:

The size of the MPI communicator.

Return type:

int

cbi_toolbox.parallel.mpi.is_root_process(mpi_comm=<mpi4py.MPI.Intracomm object>)[source]

Check if the current process is root.

Parameters:

mpi_comm (mpi4py.MPI.Comm, optional) – The MPI communicator, by default MPI.COMM_WORLD.

Returns:

True if the current process is the root of the communicator.

Return type:

bool

cbi_toolbox.parallel.mpi.load(file_name, axis, mpi_comm=<mpi4py.MPI.Intracomm object>)[source]

Load a numpy array across parallel jobs in the MPI communicator. The array is sliced along the chosen dimension, with minimal bandwidth.

Parameters:
  • file_name (str) – The numpy array file to load.

  • axis (int) – The axis on which to distribute the array.

  • mpi_comm (mpi4py.MPI.Comm, optional) – The MPI communicator used to distribute, by default MPI.COMM_WORLD.

Returns:

The distributed array, and the size of the full array.

Return type:

(numpy.ndarray, tuple(int))

Raises:
  • ValueError – If the numpy version used to save the file is not supported.

  • NotImplementedError – If the array is saved in Fortran order.

cbi_toolbox.parallel.mpi.redistribute(array, src_axis, tgt_axis, full_shape=None, mpi_comm=<mpi4py.MPI.Intracomm object>)[source]

Redistribute an array along a different dimension.

Parameters:
  • array (numpy.ndarray) – The distributed array.

  • src_axis (int) – The original axis on which the array is distributed.

  • tgt_axis (int) – The axis on which the array is to be distributed.

  • full_shape (tuple(int), optional) – The full shape of the array, by default None.

  • mpi_comm (mpi4py.MPI.Comm, optional) – The MPI communicator used to distribute, by default MPI.COMM_WORLD.

Returns:

The array distributed along the new axis.

Return type:

np.ndarray

cbi_toolbox.parallel.mpi.save(file_name, array, axis, full_shape=None, mpi_comm=<mpi4py.MPI.Intracomm object>)[source]

Save a numpy array from parallel jobs in the MPI communicator. The array is gathered along the chosen dimension.

Parameters:
  • file_name (str) – The numpy array file to load.

  • array (numpy.ndarray) – The distributed array.

  • axis (int) – The axis on which to distribute the array.

  • full_shape (tuple(int), optional) – The size of the full array, by default None.

  • mpi_comm (mpi4py.MPI.Comm, optional) – The MPI communicator used to distribute, by default MPI.COMM_WORLD.

cbi_toolbox.parallel.mpi.to_mpi_datatype(np_datatype)[source]

Returns the MPI datatype corresponding to the numpy dtype provided.

Parameters:

np_datatype (numpy.dtype or str) – The numpy datatype, or name.

Returns:

The corresponding MPI datatype.

Return type:

mpi4py.MPI.Datatype

Raises:

NotImplementedError – If the numpy datatype is not listed in the conversion table.

cbi_toolbox.parallel.mpi.wait_all(mpi_comm=<mpi4py.MPI.Intracomm object>)[source]

Wait for all processes to reach this line (MPI barrier) This is just a wrapper for ease.

Parameters:

mpi_comm (mpi4py.MPI.Comm, optional) – The communicator, by default MPI.COMM_WORLD.