High Performance Stochastic Simulation on the Graphics Processing Unit |
Many realizations are required for discrete stochastic simulation
to capture accurate statistical information of the solution. This carries
a very high computational cost. Parallel algorithms are needed for
those simulations. The general-purpose Graphics Processing Unit (GPU) has
become a viable option for many parallel programming applications. The GPU
has a highly parallel architecture with high memory bandwidth and more
transistors devoted to data processing than to data caching and flow
control, compared with CPU architecture. Problems that can be implemented
with stream processing and that use limited memory are well-suited to the
GPU. There are essentially two ways to improve the
performance: parallelize the simulation across the realizations, and
parallelize the simulation within one realization. We have been
experimenting with parallelization across the realizations for stochastic simulation of biochemical systems, and with parallelization simultaneously across the
realizations and within a single realization for the simulation of fish schools.
This work is in collaboration with Professor Jeff Moehlis (UCSB Department of Mechanical Engineering) and Professor Iain Couzin (Princeton University, Department of Ecology and Evolutionary Biology), and their research groups.
 |
| Group polarization as a function of
r averaged over 1120 steady-state replicate
simulations run in parallel on the GPU. (a)
A realization of the swarm state (r = 0.125),
(b) a realization of the dynamic parallel state
(r = 2), (c) a realization of the highly parallel
state (r = 1000).
|
 |
|
Memory layout
|
 |
|
Hardware Model : A set of SIMD multiprocessors with on-chip shared
memory (from Nvidia)
|
|
|