Tutorial level: Intro




Game of Life on GPU - Interactive & Extensible

This notebook demonstrates some functions of Cling-CUDA and Jupyter Notebooks and gives an idea what you can do with C++ in a web browser. The example shows the usual workflow of simulation and analysis. The simulation runs Conway's Game of Life on a GPU.

You can jump directly to the functions and look at them, but independent execution of the cells is not possible because there are some dependencies. Just start with the first cell and run (Shift + Enter) the cells with one exception(see nonlinear program flow) downwards.

The following functions can be found in the notebook:

Cling allows to include and link existing code. For this project some help functions were written, which are loaded and compiled at runtime. Additionally, the external library pngwriter is used to create images from the data. The source code for this example was downloaded from git and compiled as a shared library.

Commands beginning with a dot are Cling metacommands. They make it possible to manipulate the state of the cling instance. For example, .I allows you to specify an include path at runtime.

set include path

In [1]:
.I pngwriter/include
In [2]:
#include <fstream>
#include <vector>
#include <sstream>
#include <chrono>
#include <thread>

// lib pngwriter
#define NO_FREETYPE
#include <pngwriter.h>

// self-defined help functions
#include "color_maps.hpp"
#include "input_reader.hpp"
#include "png_generator.hpp"

// help functions for additional notebook functions
#include "xtl/xbase64.hpp"
#include "xeus/xjson.hpp"

load precompield shared libary (pngwriter)

In [3]:
.L pngwriter/lib/libPNGwriter.so

Help functions

In [4]:
// checks whether the return value was successful
// If not, print an error message
inline void cuCheck(cudaError_t code){
    if(code != cudaSuccess){
        std::cerr << "Error code: " << code << std::endl << cudaGetErrorString(code) << std::endl;
    }
}
In [5]:
// display image in the notebook
void display_image(std::vector< unsigned char> & image, bool clear_ouput){
    // memory objects for output in the web browser
    std::stringstream buffer;
    xeus::xjson mine;

    if(clear_ouput)
        xeus::get_interpreter().clear_output(true);

    buffer.str("");
    for(auto c : image){
        buffer << c;
    }

    mine["image/png"] = xtl::base64encode(buffer.str());
    xeus::get_interpreter().display_data(
        std::move(mine),
        xeus::xjson::object(),
        xeus::xjson::object());
}

CUDA Kernels

  • define kernels at runtime
  • unfortunately C++ does not allow redefinition
In [6]:
// Improve the memory access (reduce the memory control actions)
// The kernel appends the first row to the last row and the last before the first
// The top line is also appended below the last line and vice versa

__global__ void copy_ghostcells(int dim, int *world) {
  int col = blockIdx.x * blockDim.x + threadIdx.x;
  int row = blockIdx.y * blockDim.y + threadIdx.y;

  // ignore the first two threads: only needed to copy ghost cells for columns 
  if(col > 1) {
    if(row == 0) {
      // Copy first real row to bottom ghost row
      world[col-1 + (dim+2)*(dim+1)] = world[(dim+2)     + col-1];
    }else{
      // Copy last real row to top ghost row
      world[col-1]                   = world[(dim+2)*dim + col-1];
    }
  }
  __syncthreads();

  if(row == 0) {
    // Copy first real column to right most ghost column
    world[col*(dim+2)+dim+1] = world[col*(dim+2) + 1];
  } else {
    // Copy last real column to left most ghost column
    world[col*(dim+2)      ] = world[col*(dim+2) + dim];
  }

}
In [7]:
// main kernel which calculates an iteration of game of life
__global__ void GOL_GPU(int dim, int *world, int *newWorld) {
   int row = blockIdx.y * blockDim.y + threadIdx.y + 1;
   int col = blockIdx.x * blockDim.x + threadIdx.x + 1;
   int id = row*(dim+2) + col;

   int numNeighbors;
   int cell = world[id];

   numNeighbors =   world[id+(dim+2)]   // lower
      + world[id-(dim+2)]               // upper
      + world[id+1]                     // right
      + world[id-1]                     // left

      + world[id+(dim+3)]   // diagonal lower right
      + world[id-(dim+3)]   // diagonal upper left
      + world[id-(dim+1)]   // diagonal upper right
      + world[id+(dim+1)];  // diagonal lower left

   if (cell == 1 && numNeighbors < 2)
      newWorld[id] = 0;

    // 2) Any live cell with two or three live neighbours lives
    else if (cell == 1 && (numNeighbors == 2 || numNeighbors == 3))
      newWorld[id] = 1;

    // 3) Any live cell with more than three live neighbours dies
    else if (cell == 1 && numNeighbors > 3)
      newWorld[id] = 0;

    // 4) Any dead cell with exactly three live neighbours becomes a live cell
    else if (cell == 0 && numNeighbors == 3)
      newWorld[id] = 1;

    else
      newWorld[id] = cell;
}

Game of Life Setup

In [8]:
// size of the world
const unsigned int dim = 10;
// two extra columns and rows for ghostcells
const unsigned int world_size = dim + 2;
unsigned int iterations = 5;

// pointer for host and device memory 
int * sim_world;
int * d_sim_world;
int * d_new_sim_world;
int * d_swap;

// saves the images of each simulation step
std::vector< std::vector< unsigned char > > sim_pngs;
// describe the color of a living or dead cell
BlackWhiteMap<int> bw_map;
In [9]:
// allocate memory on CPU and GPU
sim_world = new int[ world_size * world_size ];
cuCheck(cudaMalloc( (void **) &d_sim_world, sizeof(int)*world_size*world_size));
cuCheck(cudaMalloc( (void **) &d_new_sim_world, sizeof(int)*world_size*world_size));

Interactive Input

  • Jupyter Notebook offers "magic" commands that provide language-independent functions
  • magic commands starts with %%
  • %%file [name] writes the contents of a cell to a file
    • the file is stored in the same folder as the notebook and can be loaded via C/C++ functions
  • depends on the language kernel

Define the initial world for the Game-of-Life simulation. X are living cells and 0 are dead.

In [10]:
%%file input.txt
0 0 0 0 0 0 0 0 0 0
0 0 X 0 0 0 0 0 0 0
0 0 0 X 0 0 0 0 0 0
0 X X X 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Overwriting input.txt

Cling I/O-System

  • read_input() reads the initial world from a file and returns an error code
  • Return values:
    • 0 = success
    • -1 = file cannot be opened
    • -2 = too many elements in file -> extra elements are ignored
In [11]:
read_input("input.txt", sim_world, dim, dim, true)
Out[11]:
0
In [12]:
cuCheck(cudaMemcpy(d_sim_world, sim_world, sizeof(int)*world_size*world_size, cudaMemcpyHostToDevice));
In [13]:
// create an image of the initial world
sim_pngs.push_back(generate_png<int>(sim_world, world_size, world_size, &bw_map, true, 20));

Interactive Simulation: Main Loop

  • calculate the game of life
In [17]:
// main loop
for(unsigned int i = 0; i < iterations; ++i) {

    copy_ghostcells<<<1, dim3(dim+2, 2, 1)>>>(dim, d_sim_world);
    GOL_GPU<<<1, dim3(dim, dim, 1)>>>(dim, d_sim_world, d_new_sim_world);
    cuCheck(cudaDeviceSynchronize());

    d_swap = d_new_sim_world;
    d_new_sim_world = d_sim_world;
    d_sim_world = d_swap;

    cuCheck(cudaMemcpy(sim_world, d_sim_world, sizeof(int)*world_size*world_size, cudaMemcpyDeviceToHost));
    sim_pngs.push_back(generate_png<int>(sim_world, world_size, world_size, &bw_map, true, 20));
}

Display Simulation Images

  • xeus-cling offers a built-in C++ library for displaying media content in notebooks
  • see xeus-cling example
In [18]:
for(int i = 0; i < sim_pngs.size(); ++i) {
    display_image(sim_pngs[i], true);
    std::cout << "iteration = " << i << std::endl;
    std::this_thread::sleep_for(std::chrono::milliseconds(800));
}
iteration = 8

Nonlinear Program Flow

Jupyter Notebook enables nonlinear program execution. You can execute a cell again. The result may vary depending on the source code and the state of the runtime.

For example, if you repeat the main loop of the simulation, the simulation continues because the state of the simulation is in the 5th iteration. If you run the cell again, you calculate step 6 to 10 of the simulation. You can also change cell variables. Simply set the iterations variable to 3, run the main loop and the visualization cell again and see what happens.

In [16]:
iterations = 3;

In-Situ Data Analysis

After the simulation, the results must be analyzed. Often it is processed by an additional process, which means that you have to write your data to disk and reload it. Depending on the simulation, it can take a long time. Alternatively, you can integrate your analysis into the simulation. Then you don't need the time to save and load the data, but you need to know what you want to analyze before running the simulation. If you want to perform another analysis, e.g. because you get new insights from a previous analysis, you have to run the simulation again.

Cling can combine the advantages of both methods. You can add an analysis at runtime and analyze the simulation data without copying it.

Add a new analysis

Count the neighbors of a cell and display them as a heat map.

  • persistent simulation data on the GPU
  • add analysis on-the-fly and in-memory

Dynamic Extension Without Loss of State

Define an analysis kernel on-the-fly.

In [19]:
// counts the neighbors of a cell
__global__ void get_num_neighbors(int dim, int *world, int *newWorld) {
   int row = blockIdx.y * blockDim.y + threadIdx.y + 1;
   int col = blockIdx.x * blockDim.x + threadIdx.x + 1;
   int id = row*(dim+2) + col;

   newWorld[id] =   world[id+(dim+2)]   // lower
      + world[id-(dim+2)]               // upper
      + world[id+1]                     // right
      + world[id-1]                     // left

      + world[id+(dim+3)]   // diagonal lower right
      + world[id-(dim+3)]   // diagonal upper left
      + world[id-(dim+1)]   // diagonal upper right
      + world[id+(dim+1)];  // diagonal lower left
}
In [20]:
// allocate extra memory on the GPU to ouput the analysis
int * d_ana_world;
cuCheck(cudaMalloc( (void **) &d_ana_world, sizeof(int)*world_size*world_size));

// allocate memory on CPU to generate an image
std::vector< std::vector< unsigned char > > ana_pngs;
int * ana_world = new int[world_size*world_size];
In [21]:
// run the analysis
// uuse the simulation data as input and write the result into an extra memory
get_num_neighbors<<<1,dim3(dim, dim, 1)>>>(dim, d_sim_world, d_ana_world);
In [22]:
// copy analysis data to the CPU
cuCheck(cudaMemcpy(ana_world, d_ana_world, sizeof(int)*world_size*world_size, cudaMemcpyDeviceToHost));
In [23]:
// define a color map for the heat map
// use the same code which has generated images of the game of life world
template <typename varTyp>
struct HeatMap : ColorMap<varTyp>
{
    int r(varTyp value){return value * 65535/8;}
    int g(varTyp value){return 0;}
    int b(varTyp value){return 0;}
};
HeatMap<int> h_map;
In [25]:
// generate a heat map image
ana_pngs.push_back(generate_png<int>(ana_world, world_size, world_size, &h_map, true, 20));
In [18]:
for(int i = 0; i < sim_pngs.size(); ++i) {
    display_image(sim_pngs[i], true);
    std::cout << "iteration = " << i << std::endl;
    std::this_thread::sleep_for(std::chrono::milliseconds(800));
}
iteration = 8
In [26]:
for(int i = 0; i < ana_pngs.size(); ++i) {
    display_image(ana_pngs[i], true);
    std::cout << "iteration = " << i << std::endl;
    std::this_thread::sleep_for(std::chrono::milliseconds(800));
}
iteration = 0

Continue Simulation-Analysis-Loop

You have completed your first iteration of the analysis. Now you can continue the simulation with

In [ ]:
iterations = 3;

steps and run the main loop and the neighborhood analysis again

In [ ]:
// copy together the content of different cells, to avoid extra navigation

// main loop
for(unsigned int i = 0; i < iterations; ++i) {

    copy_ghostcells<<<1, dim3(dim+2, 2, 1)>>>(dim, d_sim_world);
    GOL_GPU<<<1, dim3(dim, dim, 1)>>>(dim, d_sim_world, d_new_sim_world);
    cuCheck(cudaDeviceSynchronize());

    d_swap = d_new_sim_world;
    d_new_sim_world = d_sim_world;
    d_sim_world = d_swap;

    cuCheck(cudaMemcpy(sim_world, d_sim_world, sizeof(int)*world_size*world_size, cudaMemcpyDeviceToHost));
    sim_pngs.push_back(generate_png<int>(sim_world, world_size, world_size, &bw_map, true, 20));
}

// run the analysis
// use the simulation data as input and write the result to an extra chuck of memory
get_num_neighbors<<<1,dim3(dim, dim, 1)>>>(dim, d_sim_world, d_ana_world);

// copy analysis data to CPU
cuCheck(cudaMemcpy(ana_world, d_ana_world, sizeof(int)*world_size*world_size, cudaMemcpyDeviceToHost));

ana_pngs.clear();

// generate heat map image
ana_pngs.push_back(generate_png<int>(ana_world, world_size, world_size, &h_map, true, 20));

for(int i = 0; i < sim_pngs.size(); ++i) {
    display_image(sim_pngs[i], true);
    std::cout << "iteration = " << i << std::endl;
    std::this_thread::sleep_for(std::chrono::milliseconds(800));
}

display_image(ana_pngs.back(), false);

or develop a new analysis:

In [ ]:
// time for new code

Click here to jump to the visualization cell of the simulation and display all simulation steps.

Resetting the simulation without restarting the kernel

If you want to calculate the simulation with a new initial world without restarting the kernel, you must reset the following variables.

In [ ]:
// load a new inital world in the host memory
read_input("input.txt", sim_world, dim, dim, true);
// copy the world to the device
cuCheck(cudaMemcpy(d_sim_world, sim_world, sizeof(int)*world_size*world_size, cudaMemcpyHostToDevice));
// delete the old images
sim_pngs.clear();
// create an image of the initial world
sim_pngs.push_back(generate_png<int>(sim_world, world_size, world_size, &bw_map, true, 20));
In [ ]: