Research

High Performance Stochastic Simulation on the Graphics Processing Unit

Many realizations are required for discrete stochastic simulation to capture accurate statistical information of the solution. This carries a very high computational cost. Parallel algorithms are needed for those simulations. The general-purpose Graphics Processing Unit (GPU) has become a viable option for many parallel programming applications. The GPU has a highly parallel architecture with high memory bandwidth and more transistors devoted to data processing than to data caching and flow control, compared with CPU architecture. Problems that can be implemented with stream processing and that use limited memory are well-suited to the GPU. There are essentially two ways to improve the performance: parallelize the simulation across the realizations, and parallelize the simulation within one realization. We have been experimenting with parallelization across the realizations for stochastic simulation of biochemical systems, and with parallelization simultaneously across the realizations and within a single realization for the simulation of fish schools.

This work is in collaboration with Professor Jeff Moehlis (UCSB Department of Mechanical Engineering) and Professor Iain Couzin (Princeton University, Department of Ecology and Evolutionary Biology), and their research groups.
Group polarization as a function of r averaged over 1120 steady-state replicate simulations run in parallel on the GPU. (a) A realization of the swarm state (r = 0.125), (b) a realization of the dynamic parallel state (r = 2), (c) a realization of the highly parallel state (r = 1000).
Memory layout
Hardware Model : A set of SIMD multiprocessors with on-chip shared memory (from Nvidia)