A Julia wrapper for the NVIDIA Collective Communications Library (NCCL). NCCL is an NVIDIA library for multi-GPU and multi-node communication, optimized for NVIDIA GPUs. The API is designed to be similar to that of MPI, but there are some differences. For example, unlike CUDA-aware MPI, NCCL can drive multiple devices per process. You can read more about how NCCL compares with MPI here, and in fact the two can be used together.
NCCL is used internally in several other CUDA libraries, for example cusolverMp.