Granular Dynamics Simulation on Multiple GPUs Using Domain Decomposition
This paper describes the software infrastructure needed to enable massive multi-body simulation using multiple GPUs. Utilizing a domain decomposition approach, a large system made up of billions of bodies can be split into self-contained subdomains which are then transferred to different GPUs and solved in parallel. Parallelism is enabled on multiple levels, first on the CPU through OpenMP and secondly on the GPU through NVIDIA CUDA (Compute Unified Device Architecture). This heterogeneous software infrastructure can be extended to networks of computers using MPI (Message Passing Interface) as each subdomain is self-contained. This paper will discuss the implementation of the spatial subdivision algorithm used for subdomain creation along with the algorithms used for collision detection and constraint solution.