Tutorial level: Intro

# Game of Life on GPU - Interactive & Extensible¶

xeus-cling provides a Jupyter kernel for C++ with the help of the C++ interpreter cling and the native implementation of the Jupyter protocol xeus.

Within the xeus-cling framework, Clad can enable automatic differentiation (AD) such that users can automatically generate C++ code for their computation of derivatives of their functions.

## Rosenbrock Function¶

In mathematical optimization, the Rosenbrock function is a non-convex function used as a performance test problem for optimization problems. The function is defined as:

In :
#include "clad/Differentiator/Differentiator.h"
#include <chrono>

In :
// Rosenbrock function declaration
double rosenbrock_func(double x, double y) {
return (x - 1) * (x - 1) + 100 * (y - x * x) * (y - x * x);
}


In order to compute the function’s derivatives, we can employ both Clad’s Forward Mode or Reverse Mode as detailed below:

For a function f of several inputs and single (scalar) output, forward mode AD can be used to compute (or, in case of Clad, create a function) computing a directional derivative of f with respect to a single specified input variable. Moreover, the generated derivative function has the same signature as the original function f, however its return value is the value of the derivative.

In :
double rosenbrock_forward(double x[], int size) {
double sum = 0;
for (int i = 0; i < size-1; i++) {
double one = rosenbrockX.execute(x[i], x[i + 1]);
double two = rosenbrockY.execute(x[i], x[i + 1]);
sum = sum + one + two;
}
return sum;
}

In :
const int size = 100000000;
double Xarray[size];
for(int i=0;i<size;i++)
Xarray[i]=((double)rand()/RAND_MAX);

In :
double forward_time = timeForwardMode();

Out:
Elapsed time for rosenbrock_forward: 3.038005 s
The result of the function is 3232877463.475859.


Reverse-mode AD enables the gradient computation within a single pass of the computation graph of f using at most a constant factor (around 4) more arithmetical operations compared to the original function. While its constant factor and memory overhead is higher than that of the forward-mode, it is independent of the number of inputs.

Moreover, the generated function has void return type and same input arguments. The function has an additional argument of type T*, where T is the return type of f. This is the “result” argument which has to point to the beginning of the vector where the gradient will be stored.

In :
double rosenbrock_reverse(double x[], int size) {
double sum = 0;
for (int i = 0; i < size-1; i++) {
double result = {};
rosenbrock_dX_dY.execute(x[i],x[i+1], result);
sum = sum + result + result;
}
return sum;
}

In :
double forward_time = timeForwardMode();

Out:
Elapsed time for rosenbrock_forward: 3.038005 s
The result of the function is 3232877463.475859.


## Performance Comparison¶

The derivative function created by the forward-mode AD is guaranteed to have at most a constant factor (around 2-3) more arithmetical operations compared to the original function. Whilst for the reverse-mode AD for a function having N inputs and consisting of T arithmetical operations, computing its gradient takes a single execution of the reverse-mode AD and around 4T operations. In comparison, it would take N executions of the forward-mode, this requiring up to N3*T operations.

In :
double difference = forward_time -  reverse_time;
printf("Forward - Reverse timing for an array of size: %d is: %fs\n", size, difference);

Out:
Forward - Reverse timing for an array of size: 100000000 is: 0.125806s


We can now call rosenbrockX / rosenbrockY / rosenbrock_dX_dY.dump() to obtain a print out of the Clad’s generated code. As an illustration, the reverse-mode produced code is:

In :
auto rosenbrock_dX_dY = clad::gradient(rosenbrock_func);
rosenbrock_dX_dY.dump();

Out:
The code is: void rosenbrock_func_grad(double x, double y, double *_result) {
double _t2;
double _t3;
double _t4;
double _t5;
double _t6;
double _t7;
double _t8;
double _t9;
double _t10;
_t3 = (x - 1);
_t2 = (x - 1);
_t7 = x;
_t6 = x;
_t5 = (y - _t7 * _t6);
_t8 = 100 * _t5;
_t10 = x;
_t9 = x;
_t4 = (y - _t10 * _t9);
double rosenbrock_func_return = _t3 * _t2 + _t8 * _t4;
goto _label0;
_label0:
{
double _r0 = 1 * _t2;
_result[0UL] += _r0;
double _r1 = _t3 * 1;
_result[0UL] += _r1;
double _r2 = 1 * _t4;
double _r3 = _r2 * _t5;
double _r4 = 100 * _r2;
_result[1UL] += _r4;
double _r5 = -_r4 * _t6;
_result[0UL] += _r5;
double _r6 = _t7 * -_r4;
_result[0UL] += _r6;
double _r7 = _t8 * 1;
_result[1UL] += _r7;
double _r8 = -_r7 * _t9;
_result[0UL] += _r8;
double _r9 = _t10 * -_r7;
_result[0UL] += _r9;
}
}