Vectorized Forward Mode Automatic Differentiation is a computational technique that combines two powerful concepts: vectorization and forward mode automatic differentiation. This approach is used to efficiently compute derivatives of functions with respect to multiple input variables by taking advantage of both parallel processing capabilities and the structure of the computation graph.

In forward mode AD, the restriction is that the function can only be differentiated with respect to a single input variable. However, in many cases, it is desirable to differentiate a function with respect to multiple input variables. One way to do this is to use a vectorized version of forward mode AD.

Without vector mode, for computing derivative of a function with n-dimensional input - forward mode requires n forward passes, i.e. one for each input variable. In vector mode, all these computations are batched together and computed in a single forward pass and the function is differentiated with respect to multiple input variables. This helps reduce the overhead of computing an expensive operation in multiple forward passes. It also helps utilize the vectorization capabilities of the hardware.

The output of the function is a vector of partial derivatives with respect to each input variable. Currently, Clad only supports the vectorized version of forward mode AD. This similar approach supports a vectorized version of reverse mode AD as well.

The following code snippet shows how one can request Clad to use vector mode for differentiation:

```
#include "clad/Differentiator/Differentiator.h"
double prod(double x, double y, double z) { return x*y*z; }
int main(){
auto grad = clad::differentiate<clad::opts::vector_mode>(prod, "x,y");
double x = 3.0, y = 4.0, z = 5.0;
double dx = 0.0, dy = 0.0;
grad.execute(x, y, z, &dx, &dy);
printf("d_x = %.2f, d_y = %.2f\n", dx, dy);
}
```

In this example,
we have used `clad::differentiate`

to generate a function `grad`

that calculates
the gradient of the function `prod`

with respect to the variables `x and y`

as we
can see the references are passed. The significance of `clad::opts::vector_mode`

is explained in the below sections.

To demonstrate the working of standard forward mode AD, we have shown two versions of the code block below. The initial snippet is the original function and later we have the generated function by Clad for differentiating.

```
double f(double x, double y, double z) {
return x + y + z;
}
int main() {
// Call Clad to generate the derivative of f wrt x.
auto f_dx_dz = clad::differentiate(f, "x");
// Execute the generated derivative function.
double dx = f_dx.execute(/*x=*/ 3, /*y=*/4, /*z=*/ 5);
}
```

The `clad::differentiate`

performs the derivative of the function `f`

with respect
to `x`

, it is used for automatic differentiation, and it generates a function that
computes the derivative. It specifies `x`

as the argument to indicate that it wants
derivatives with respect to`x`

.

The above gets generated into below function generated by Clad for differentiating:

```
double f_darg0(double x, double y, double z) {
double _d_x = 1;
double _d_y = 0;
double _d_z = 0;
return _d_x + _d_y + _d_z;
}
```

When we see this with vector mode, we can compute derivate w.r.t. multiple params together shown below.

Same function taken as the above example:

```
double f(double x, double y, double z) {
return x + y + z;
}
int main() {
// Call clad to generate the derivative of f wrt x and z.
auto f_dx_dz = clad::differentiate <clad::opts::vector_mode> (f, "x,z");
// Execute the generated derivative function.
double dx = 0, dy = 0, dz = 0;
f_dx.execute(/*x=*/ 3, /*y=*/4, /*z=*/ 5, &dx, &dz);
}
```

The `clad::differentiate`

performs the derivative of the function `f`

with respect
to `x and z`

, it is used for automatic differentiation, and it generates a
function that computes the derivatives as shown below. It specifies `x,z`

as the
argument to indicate that it wants derivatives with respect to both `x and z`

.
The `<clad::opts::vector_mode>`

has been used here.

`f_dx.execute`

calculates the derivatives and stores them in the dx and dz by
giving their references.

This above function gets generated into the below form:

```
void f_dvec_0_2(double x, double y, double z, double *_d_x, double *_dz) {
clad::array<double> d_vec_x = {1, 0};
clad::array<double> d_vec_y = {0, 0}
clad::array<double> d_vec_z = {0, 1};
{
clad::array<double> d_vec_ret = d_vec_x + d_vec_y + d_vec_z;
*_d_x = d_vec_ret[0];
*_d_z = d_vec_ret[1];
return;
}
}
```

**Note**: The main change in interface is using the `clad::opts::vector_mode`

.

For computing gradient of a function with an n-dimensional input (đť•ź) - forward mode requires n forward passes.

Non-Vectorized Forward Mode Automatic Differentiation

**Reference**: https://jnamaral.github.io/icpp20/slides/Hueckelheim_Vector.pdf

We can do this in a single forward pass, instead of accumulating a single scalar value of derivative with respect to a particular node, we maintain a gradient vector at each node. Although, the strategy is pretty similar, it requires three passes for computing partial derivatives w.r.t. the three scalar inputs of the function.

**Can we combine these?**

Vectorized Operations

At each node, we maintain a vector, storing the complete gradient of that nodeâ€™s output w.r.t.. all the input parameters. All operations are now vector operations, for example, applying the sum rule will result in the addition of vectors. Initialization for input nodes are done using one-hot vectors.

We know that each node requires computing a vector, which requires more memory and more time, which adds to these memory allocation calls. This must be offset by some improvement in computing efficiency.

Vectorized Operations

This can prevent the recomputation of some expensive functions, which would have executed in a non-vectorized version due to multiple forward passes. This approach can take advantage of the hardwareâ€™s vectorization and parallelization capabilities (using SIMD techniques).