Forrest Glines and Faculty Mentor: David Neilsen, Physics Department
This research project concerns the development simulation code to confirm neutron star mergers as the progenitors of Short Hard Gamma Ray Bursts. Short Hard Gamma Ray Bursts (SHGBs) are short (less than 2 second) high energy bursts that we observe with satellites. Their exact cause has not yet been confirmed, but they are believed to be created by the merging of either two neutron stars or a neutron star falling into a black hole. Neutron stars are ultra dense, highly magnetic, and compact stars at the end of their evolution. In a binary system the stars lose angular momentum to gravitational waves and come together in a hot merger. A black hole quickly forms, cutting off further emission. The high energy of the merger and formation of a black hole make them prime candidates of SHGB progenitors. Through careful simulation, it may be possible to confirm this hypothesis.
These mergers are very complicated to simulate. It requires simulating the intense magnetic and relativistic effects of the merger. Codes do exist that can do these simulations, however they take months to complete. In order to be able to experiment with different merger scenarios, new codes that take advantage of modern computer hardware will have to be developed.
One important trend in scientific computing is the use of coprocessors to supplement the traditional CPU on a motherboard. Coprocessors, such as GPUs and Intel Phis, are devices with hundreds of times more cores than normal processors. They provide more processing power for less money and power consumption. These cores however are limited in that they run in parallel semisynchronized, rather than completely independent like the 16 cores on a traditional CPU. Unlike previous decades, processing speed and CPU cost and capabilities have somewhat stagnated, leading to the use of these coprocessors in modern supercomputers. Due to their growing ubiquity, modern scientific codes will have to efficiently use these devices.
This research project entailed implementing a relativistic magnetohydrodynamics (MHD) code on NVIDIA GPUs, using the existing CPU HAD1 as a basis. The code2 was successfully tested on a supercomputer with over 100 GPU nodes. It showed faster performance than HAD and exhibited good scaling properties.
The high resolution shock capturing implementation mirrors the Godunovtype method of HAD. Godunov methods form a class of hydrodynamics solvers that approximate the state of the fluid on a grid, keeping track of values such as the density, pressure, and velocity of the fluid. At each time step, the approximate fluid values near each face in each grid cell is reconstructed. The code then uses the approximated values on both sides of each cell interface to solve what is called the Riemann Problem. The Riemann Problem is a well studied problem in hydrodynamics that consists of two regions of a fluid (with differing densities, pressures, etc.) initially separated by a flat boundary that disappears. The solution to the Riemann Problem is approximated for the flux, or movement of fluid across the interface. With the flux from the 6 faces to each cell, the change to each fluid variable for the current time step is calculated. The evolution in time is integrated with a RungeKutta 3 integrator.
I tested the code on Indiana University’s supercomputer Big Red II. This machine is unique in that it was over 600 GPU nodes with NVIDIA Tesla K20 GPUs. I was able to obtain dedicated time to test the using the entire system.The code was compared to the established relativistic MHD code HAD. For comparison tests, a Gaussian pulse (or blast wave) on a 1603 grid was evolved in both HAD and the new GPU code. The density in both solutions differed by less than 10-14, within machine error, after a many iterations in time, indicating that they give equivalent results. On a single node the GPU code also ran 2.5x faster than the HAD code on 32 CPU cores, and 80x faster than a single core running HAD. The GPU code also demonstrated good strong and weak scaling. Strong scaling is where dividing a fixed size problem up amongst many nodes will proportionally decrease the run time, while weak scaling is when the run time should stay constant when the same amount of work is given to each node, no matter the total problem size. Inevitably the need for communication between nodes will prevent perfect scaling, but this proved negligible for the GPU code.
The code shows promise. It demonstrates that a GPU code for high resolution shock capturing can scale well, run several times faster than a traditional CPU based code, and still give accurate results. With these positive results, development of the code can continue so that full scale simulations of neutron star mergers can be done on GPUs.
In the future, I plan to add Adaptive Mesh Refinement (AMR) to the code. Instead of using one static grid, AMR refines the coarse grid with overlaying finer grids to more accurately solve the system at points of interest. The points needing refinement are computed as the simulation runs. AMR requires a more complicated interaction between grids. The code must also be able to balance work loads on processors, as the finer grids progress slower than the coarser grids. However, from the initial success of the static grid implementation, the effort to create an AMR version of the code will be well worth the effort.
1 M. Anderson, E. Hirschmann, S. Liebling, and D. Neilsen, “Relativistic mhd and adaptive mesh refinement,” Class. Quantum Grav., vol. 23, pp. 6503–6524, 2006.
2 F. Glines, M. Anderson, D. Neilsen, “Scalable relativistic highresolution shockcapturing for heterogeneous computing,” Cluster Computing (CLUSTER), 2015 IEEE International Conference on, pp 611618, 2015.