Game of Life on GPU - Interactive & Extensible¶

xeus-cling provides a Jupyter kernel for C++ with the help of the C++ interpreter cling and the native implementation of the Jupyter protocol xeus.

Within the xeus-cling framework, Clad can enable automatic differentiation (AD) such that users can automatically generate C++ code for their computation of derivatives of their functions.

Rosenbrock Function¶

In mathematical optimization, the Rosenbrock function is a non-convex function used as a performance test problem for optimization problems. The function is defined as:

#include "clad/Differentiator/Differentiator.h"
#include <chrono>

// Rosenbrock function declaration
double rosenbrock_func(double x, double y) {
return (x - 1) * (x - 1) + 100 * (y - x * x) * (y - x * x);
}

In order to compute the function’s derivatives, we can employ both Clad’s Forward Mode or Reverse Mode as detailed below:

Forward Mode AD¶

For a function f of several inputs and single (scalar) output, forward mode AD can be used to compute (or, in case of Clad, create a function) computing a directional derivative of f with respect to a single specified input variable. Moreover, the generated derivative function has the same signature as the original function f, however its return value is the value of the derivative.

double rosenbrock_forward(double x[], int size) {
    double sum = 0;
    auto rosenbrockX = clad::differentiate(rosenbrock_func, 0);
    auto rosenbrockY = clad::differentiate(rosenbrock_func, 1);
    for (int i = 0; i < size-1; i++) {
        double one = rosenbrockX.execute(x[i], x[i + 1]);
        double two = rosenbrockY.execute(x[i], x[i + 1]);
            sum = sum + one + two;
    }
    return sum;
}

const int size = 100000000;
double Xarray[size];
for(int i=0;i<size;i++)
  Xarray[i]=((double)rand()/RAND_MAX);

double forward_time = timeForwardMode();

Elapsed time for rosenbrock_forward: 3.038005 s
The result of the function is 3232877463.475859.

Reverse Mode AD¶

Reverse-mode AD enables the gradient computation within a single pass of the computation graph of f using at most a constant factor (around 4) more arithmetical operations compared to the original function. While its constant factor and memory overhead is higher than that of the forward-mode, it is independent of the number of inputs.

Moreover, the generated function has void return type and same input arguments. The function has an additional argument of type T*, where T is the return type of f. This is the “result” argument which has to point to the beginning of the vector where the gradient will be stored.

double rosenbrock_reverse(double x[], int size) {
    double sum = 0;
    auto rosenbrock_dX_dY = clad::gradient(rosenbrock_func);
    for (int i = 0; i < size-1; i++) {
        double result[2] = {};
        rosenbrock_dX_dY.execute(x[i],x[i+1], result);
        sum = sum + result[0] + result[1];
    }
    return sum;
}

double forward_time = timeForwardMode();

Elapsed time for rosenbrock_forward: 3.038005 s
The result of the function is 3232877463.475859.

Performance Comparison¶

The derivative function created by the forward-mode AD is guaranteed to have at most a constant factor (around 2-3) more arithmetical operations compared to the original function. Whilst for the reverse-mode AD for a function having N inputs and consisting of T arithmetical operations, computing its gradient takes a single execution of the reverse-mode AD and around 4T operations. In comparison, it would take N executions of the forward-mode, this requiring up to N3*T operations.

double difference = forward_time -  reverse_time;
printf("Forward - Reverse timing for an array of size: %d is: %fs\n", size, difference);

Forward - Reverse timing for an array of size: 100000000 is: 0.125806s

Clad Produced Code¶

We can now call rosenbrockX / rosenbrockY / rosenbrock_dX_dY.dump() to obtain a print out of the Clad’s generated code. As an illustration, the reverse-mode produced code is:

auto rosenbrock_dX_dY = clad::gradient(rosenbrock_func);
rosenbrock_dX_dY.dump();

The code is: void rosenbrock_func_grad(double x, double y, double *_result) {
    double _t2;
    double _t3;
    double _t4;
    double _t5;
    double _t6;
    double _t7;
    double _t8;
    double _t9;
    double _t10;
    _t3 = (x - 1);
    _t2 = (x - 1);
    _t7 = x;
    _t6 = x;
    _t5 = (y - _t7 * _t6);
    _t8 = 100 * _t5;
    _t10 = x;
    _t9 = x;
    _t4 = (y - _t10 * _t9);
    double rosenbrock_func_return = _t3 * _t2 + _t8 * _t4;
    goto _label0;
  _label0:
    {
        double _r0 = 1 * _t2;
        _result[0UL] += _r0;
        double _r1 = _t3 * 1;
        _result[0UL] += _r1;
        double _r2 = 1 * _t4;
        double _r3 = _r2 * _t5;
        double _r4 = 100 * _r2;
        _result[1UL] += _r4;
        double _r5 = -_r4 * _t6;
        _result[0UL] += _r5;
        double _r6 = _t7 * -_r4;
        _result[0UL] += _r6;
        double _r7 = _t8 * 1;
        _result[1UL] += _r7;
        double _r8 = -_r7 * _t9;
        _result[0UL] += _r8;
        double _r9 = _t10 * -_r7;
        _result[0UL] += _r9;
    }
}