Game of Life on GPU Using Cling-CUDA
Tutorial level: Intro
Game of Life on GPU - Interactive & Extensible¶
This notebook demonstrates some functions of Cling-CUDA and Jupyter Notebooks and gives an idea what you can do with C++ in a web browser. The example shows the usual workflow of simulation and analysis. The simulation runs Conway's Game of Life on a GPU.
You can jump directly to the functions and look at them, but independent execution of the cells is not possible because there are some dependencies. Just start with the first cell and run (Shift + Enter
) the cells with one exception(see nonlinear program flow) downwards.
The following functions can be found in the notebook:
- reuse code via header files and shared libraries
- interactive definition of CUDA kernels
- write a cell directly in a file (magic command)
- the simple I/O system of Cling
- display images directly in the notebook
- nonlinear program flow
- In-Situ data analysis (zero copy)
- dynamic extension of the program without loss of state
- continue the simulation analysis loop without additional calculation or memory copies
Include and Link Files¶
Cling allows to include and link existing code. For this project some help functions were written, which are loaded and compiled at runtime. Additionally, the external library pngwriter
is used to create images from the data. The source code for this example was downloaded from git and compiled as a shared library.
Commands beginning with a dot are Cling metacommands. They make it possible to manipulate the state of the cling instance. For example, .I
allows you to specify an include path at runtime.
set include path
.I pngwriter/include
#include <fstream>
#include <vector>
#include <sstream>
#include <chrono>
#include <thread>
// lib pngwriter
#define NO_FREETYPE
#include <pngwriter.h>
// self-defined help functions
#include "color_maps.hpp"
#include "input_reader.hpp"
#include "png_generator.hpp"
// help functions for additional notebook functions
#include "xtl/xbase64.hpp"
#include "xeus/xjson.hpp"
load precompield shared libary (pngwriter)
.L pngwriter/lib/libPNGwriter.so
Help functions¶
// checks whether the return value was successful
// If not, print an error message
inline void cuCheck(cudaError_t code){
if(code != cudaSuccess){
std::cerr << "Error code: " << code << std::endl << cudaGetErrorString(code) << std::endl;
}
}
// display image in the notebook
void display_image(std::vector< unsigned char> & image, bool clear_ouput){
// memory objects for output in the web browser
std::stringstream buffer;
xeus::xjson mine;
if(clear_ouput)
xeus::get_interpreter().clear_output(true);
buffer.str("");
for(auto c : image){
buffer << c;
}
mine["image/png"] = xtl::base64encode(buffer.str());
xeus::get_interpreter().display_data(
std::move(mine),
xeus::xjson::object(),
xeus::xjson::object());
}
CUDA Kernels¶
- define kernels at runtime
- unfortunately C++ does not allow redefinition
// Improve the memory access (reduce the memory control actions)
// The kernel appends the first row to the last row and the last before the first
// The top line is also appended below the last line and vice versa
__global__ void copy_ghostcells(int dim, int *world) {
int col = blockIdx.x * blockDim.x + threadIdx.x;
int row = blockIdx.y * blockDim.y + threadIdx.y;
// ignore the first two threads: only needed to copy ghost cells for columns
if(col > 1) {
if(row == 0) {
// Copy first real row to bottom ghost row
world[col-1 + (dim+2)*(dim+1)] = world[(dim+2) + col-1];
}else{
// Copy last real row to top ghost row
world[col-1] = world[(dim+2)*dim + col-1];
}
}
__syncthreads();
if(row == 0) {
// Copy first real column to right most ghost column
world[col*(dim+2)+dim+1] = world[col*(dim+2) + 1];
} else {
// Copy last real column to left most ghost column
world[col*(dim+2) ] = world[col*(dim+2) + dim];
}
}
// main kernel which calculates an iteration of game of life
__global__ void GOL_GPU(int dim, int *world, int *newWorld) {
int row = blockIdx.y * blockDim.y + threadIdx.y + 1;
int col = blockIdx.x * blockDim.x + threadIdx.x + 1;
int id = row*(dim+2) + col;
int numNeighbors;
int cell = world[id];
numNeighbors = world[id+(dim+2)] // lower
+ world[id-(dim+2)] // upper
+ world[id+1] // right
+ world[id-1] // left
+ world[id+(dim+3)] // diagonal lower right
+ world[id-(dim+3)] // diagonal upper left
+ world[id-(dim+1)] // diagonal upper right
+ world[id+(dim+1)]; // diagonal lower left
if (cell == 1 && numNeighbors < 2)
newWorld[id] = 0;
// 2) Any live cell with two or three live neighbours lives
else if (cell == 1 && (numNeighbors == 2 || numNeighbors == 3))
newWorld[id] = 1;
// 3) Any live cell with more than three live neighbours dies
else if (cell == 1 && numNeighbors > 3)
newWorld[id] = 0;
// 4) Any dead cell with exactly three live neighbours becomes a live cell
else if (cell == 0 && numNeighbors == 3)
newWorld[id] = 1;
else
newWorld[id] = cell;
}
Game of Life Setup¶
// size of the world
const unsigned int dim = 10;
// two extra columns and rows for ghostcells
const unsigned int world_size = dim + 2;
unsigned int iterations = 5;
// pointer for host and device memory
int * sim_world;
int * d_sim_world;
int * d_new_sim_world;
int * d_swap;
// saves the images of each simulation step
std::vector< std::vector< unsigned char > > sim_pngs;
// describe the color of a living or dead cell
BlackWhiteMap<int> bw_map;
// allocate memory on CPU and GPU
sim_world = new int[ world_size * world_size ];
cuCheck(cudaMalloc( (void **) &d_sim_world, sizeof(int)*world_size*world_size));
cuCheck(cudaMalloc( (void **) &d_new_sim_world, sizeof(int)*world_size*world_size));
Interactive Input¶
- Jupyter Notebook offers "magic" commands that provide language-independent functions
- magic commands starts with
%%
-
%%file [name]
writes the contents of a cell to a file- the file is stored in the same folder as the notebook and can be loaded via C/C++ functions
- depends on the language kernel
Define the initial world for the Game-of-Life simulation. X
are living cells and 0
are dead.
%%file input.txt
0 0 0 0 0 0 0 0 0 0
0 0 X 0 0 0 0 0 0 0
0 0 0 X 0 0 0 0 0 0
0 X X X 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Cling I/O-System¶
- read_input() reads the initial world from a file and returns an error code
-
Return values:
- 0 = success
- -1 = file cannot be opened
- -2 = too many elements in file -> extra elements are ignored
read_input("input.txt", sim_world, dim, dim, true)
cuCheck(cudaMemcpy(d_sim_world, sim_world, sizeof(int)*world_size*world_size, cudaMemcpyHostToDevice));
// create an image of the initial world
sim_pngs.push_back(generate_png<int>(sim_world, world_size, world_size, &bw_map, true, 20));
Interactive Simulation: Main Loop¶
- calculate the game of life
// main loop
for(unsigned int i = 0; i < iterations; ++i) {
copy_ghostcells<<<1, dim3(dim+2, 2, 1)>>>(dim, d_sim_world);
GOL_GPU<<<1, dim3(dim, dim, 1)>>>(dim, d_sim_world, d_new_sim_world);
cuCheck(cudaDeviceSynchronize());
d_swap = d_new_sim_world;
d_new_sim_world = d_sim_world;
d_sim_world = d_swap;
cuCheck(cudaMemcpy(sim_world, d_sim_world, sizeof(int)*world_size*world_size, cudaMemcpyDeviceToHost));
sim_pngs.push_back(generate_png<int>(sim_world, world_size, world_size, &bw_map, true, 20));
}
for(int i = 0; i < sim_pngs.size(); ++i) {
display_image(sim_pngs[i], true);
std::cout << "iteration = " << i << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(800));
}
Nonlinear Program Flow¶
Jupyter Notebook enables nonlinear program execution. You can execute a cell again. The result may vary depending on the source code and the state of the runtime.
For example, if you repeat the main loop of the simulation, the simulation continues because the state of the simulation is in the 5th iteration. If you run the cell again, you calculate step 6 to 10 of the simulation. You can also change cell variables. Simply set the iterations
variable to 3
, run the main loop and the visualization cell again and see what happens.
iterations = 3;
In-Situ Data Analysis¶
After the simulation, the results must be analyzed. Often it is processed by an additional process, which means that you have to write your data to disk and reload it. Depending on the simulation, it can take a long time. Alternatively, you can integrate your analysis into the simulation. Then you don't need the time to save and load the data, but you need to know what you want to analyze before running the simulation. If you want to perform another analysis, e.g. because you get new insights from a previous analysis, you have to run the simulation again.
Cling can combine the advantages of both methods. You can add an analysis at runtime and analyze the simulation data without copying it.
Add a new analysis
Count the neighbors of a cell and display them as a heat map.
- persistent simulation data on the GPU
- add analysis on-the-fly and in-memory
Dynamic Extension Without Loss of State¶
Define an analysis kernel on-the-fly.
// counts the neighbors of a cell
__global__ void get_num_neighbors(int dim, int *world, int *newWorld) {
int row = blockIdx.y * blockDim.y + threadIdx.y + 1;
int col = blockIdx.x * blockDim.x + threadIdx.x + 1;
int id = row*(dim+2) + col;
newWorld[id] = world[id+(dim+2)] // lower
+ world[id-(dim+2)] // upper
+ world[id+1] // right
+ world[id-1] // left
+ world[id+(dim+3)] // diagonal lower right
+ world[id-(dim+3)] // diagonal upper left
+ world[id-(dim+1)] // diagonal upper right
+ world[id+(dim+1)]; // diagonal lower left
}
// allocate extra memory on the GPU to ouput the analysis
int * d_ana_world;
cuCheck(cudaMalloc( (void **) &d_ana_world, sizeof(int)*world_size*world_size));
// allocate memory on CPU to generate an image
std::vector< std::vector< unsigned char > > ana_pngs;
int * ana_world = new int[world_size*world_size];
// run the analysis
// uuse the simulation data as input and write the result into an extra memory
get_num_neighbors<<<1,dim3(dim, dim, 1)>>>(dim, d_sim_world, d_ana_world);
// copy analysis data to the CPU
cuCheck(cudaMemcpy(ana_world, d_ana_world, sizeof(int)*world_size*world_size, cudaMemcpyDeviceToHost));
// define a color map for the heat map
// use the same code which has generated images of the game of life world
template <typename varTyp>
struct HeatMap : ColorMap<varTyp>
{
int r(varTyp value){return value * 65535/8;}
int g(varTyp value){return 0;}
int b(varTyp value){return 0;}
};
HeatMap<int> h_map;
// generate a heat map image
ana_pngs.push_back(generate_png<int>(ana_world, world_size, world_size, &h_map, true, 20));
for(int i = 0; i < sim_pngs.size(); ++i) {
display_image(sim_pngs[i], true);
std::cout << "iteration = " << i << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(800));
}
for(int i = 0; i < ana_pngs.size(); ++i) {
display_image(ana_pngs[i], true);
std::cout << "iteration = " << i << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(800));
}
Continue Simulation-Analysis-Loop¶
You have completed your first iteration of the analysis. Now you can continue the simulation with
iterations = 3;
steps and run the main loop and the neighborhood analysis again
// copy together the content of different cells, to avoid extra navigation
// main loop
for(unsigned int i = 0; i < iterations; ++i) {
copy_ghostcells<<<1, dim3(dim+2, 2, 1)>>>(dim, d_sim_world);
GOL_GPU<<<1, dim3(dim, dim, 1)>>>(dim, d_sim_world, d_new_sim_world);
cuCheck(cudaDeviceSynchronize());
d_swap = d_new_sim_world;
d_new_sim_world = d_sim_world;
d_sim_world = d_swap;
cuCheck(cudaMemcpy(sim_world, d_sim_world, sizeof(int)*world_size*world_size, cudaMemcpyDeviceToHost));
sim_pngs.push_back(generate_png<int>(sim_world, world_size, world_size, &bw_map, true, 20));
}
// run the analysis
// use the simulation data as input and write the result to an extra chuck of memory
get_num_neighbors<<<1,dim3(dim, dim, 1)>>>(dim, d_sim_world, d_ana_world);
// copy analysis data to CPU
cuCheck(cudaMemcpy(ana_world, d_ana_world, sizeof(int)*world_size*world_size, cudaMemcpyDeviceToHost));
ana_pngs.clear();
// generate heat map image
ana_pngs.push_back(generate_png<int>(ana_world, world_size, world_size, &h_map, true, 20));
for(int i = 0; i < sim_pngs.size(); ++i) {
display_image(sim_pngs[i], true);
std::cout << "iteration = " << i << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(800));
}
display_image(ana_pngs.back(), false);
or develop a new analysis:
// time for new code
Click here to jump to the visualization cell of the simulation and display all simulation steps.
Resetting the simulation without restarting the kernel¶
If you want to calculate the simulation with a new initial world without restarting the kernel, you must reset the following variables.
// load a new inital world in the host memory
read_input("input.txt", sim_world, dim, dim, true);
// copy the world to the device
cuCheck(cudaMemcpy(d_sim_world, sim_world, sizeof(int)*world_size*world_size, cudaMemcpyHostToDevice));
// delete the old images
sim_pngs.clear();
// create an image of the initial world
sim_pngs.push_back(generate_png<int>(sim_world, world_size, world_size, &bw_map, true, 20));