Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.
Our group puts efforts to incorporate and possibly redesign parts of Cling
in Clang mainline through a new tool, clang-repl. The project aims to
generalize the IncrementalCUDADeviceCompiler
of cling and add this
functionality in clang-repl.
There are several foreseen tasks:
Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.
Clang and LLVM provide access to C++ from other programming languages, but currently only exposes the declared public interfaces of such C++ code even when it has parsed implementation details directly. Both the high-level and the low-level program representation has enough information to capture and expose more of such details to improve language interoperability. Examples include details of memory management, ownership transfer, thread safety, externalized side-effects, etc. For example, if memory is allocated and returned, the caller needs to take ownership; if a function is pure, it can be elided; if a call provides access to a data member, it can be reduced to an address lookup. The goal of this project is to develop API for libInterOp which are capable of extracting and exposing such information AST or from JIT-ed code and use it in cppyy (Python-C++ language bindings) as an exemplar. If time permits, extend the work to persistify this information across translation units and use it on code compiled with Clang.
There are several foreseen tasks:
Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.
Our group puts efforts to incorporate and possibly redesign parts of Cling in Clang mainline through a new tool, clang-repl. The project aims implementing tutorials demonstrating the capabilities of the project and investigating adoption of clang-repl in xeus-cling.
There are several foreseen tasks:
The Clang compiler is part of the LLVM compiler infrastructure and supports various languages such as C, C++, ObjC and ObjC++. The design of LLVM and Clang enables them to be used as libraries, and has led to the creation of an entire compiler-assisted ecosystem of tools. The relatively friendly codebase of Clang and advancements in the JIT infrastructure in LLVM further enable research into different methods for processing C++ by blurring the boundary between compile time and runtime. Challenges include incremental compilation and fitting compile/link time optimizations into a more dynamic environment.
Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.
Our group puts efforts to incorporate and possibly redesign parts of Cling in Clang mainline through a new tool, clang-repl. The project aims at the design and implementation of robust autocompletion when users type C++ at the prompt of clang-repl. For example:
[clang-repl] class MyLongClassName {};
[clang-repl] My<tab>
// list of suggestions.
There are several foreseen tasks:
clang -code-completion-at=file:col1:col2
.libInterpreter
.[clang-repl] struct S {S* operator+(S&) { return nullptr;}};
[clang-repl] S a, b;
[clang-repl] v = a + <tab> // shows b as the only acceptable choice here.
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library is able to differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.
Vector mode support will facilitate the computation of gradients using the forward mode AD in a single pass and thus without explicitly performing differentiation n times for n function arguments. The major benefit of using vector mode is that computationally expensive operations do not need to be recomputed n times for n function arguments.
For example, if we want to compute df/dx
and df/dy
of a function
f(x, y)
using the forward mode AD in Clad, then currently we need to
explicitly differentiate f
two times. Vector mode will allow the
generation of f_d(x, y)
such that we will be able to get partial
derivatives with respect to all the function arguments (gradient) in a
single call.
After successful completion of the project the code snippet should work as expected:
#include <clad/Differentiator/Differentiator.h>
#include <iostream>
double someComputationalIntensiveFn();
double fn(double x, double y) {
double t = someComputationalIntensiveFn(); // should be computed only once
// in the derived function.
double res = 2 * t * x + 3 * t * x * y;
return t;
}
int main() {
auto d_fn = clad::differentiate(fn, "arr");
double d_x = 0, d_y = 0;
d_fn.execute(3, 5, &d_x, &d_y);
std::cout << "Derivative of fn wrt d_x: " << d_x << "\n";
std::cout << "Derivative of fn wrt d_y: " << d_y << "\n";
}
The project consists of the following tasks:
clad::differentiate
for
vector mode.In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library is able to differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.
Clad currently only supports differentiation with respect to single-dimensional arrays. Support for differentiation with respect to pointers is limited as well. This project aims to add support for multi-dimensional arrays (and pointers) in Clad.
After successful completion of the project the code snippet should work as expected:
#include <iostream>
#include "clad/Differentiator/Differentiator.h"
double fn(double arr[5][5]) {
double res = 1 * arr[0][0] + 2 * arr[1][1] + 4 * arr[2][2];
return res * 2;
}
int main() {
auto d_fn = clad::gradient(fn);
double arr[5][5] = {{1, 2, 3, 4, 5},
{6, 7, 8, 9, 10},
{11, 12, 13, 14, 15},
{16, 17, 18, 19, 20},
{21, 22, 23, 24, 25}};
double d_arr[5][5] = {};
d_fn.execute(arr, d_arr);
std::cout << "Derivative of d_fn wrt arr[0][0]: " << d_arr[0][0] << "\n"; // 2
std::cout << "Derivative of d_fn wrt arr[1][1]: " << d_arr[1][1] << "\n"; // 4
return 0;
}
The project consists of the following tasks:
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library can differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.
Clad also possesses the capabilities of annotating given source code with floating-point error estimation code. This allows Clad to compute any floating-point related errors in the given function on the fly. This allows Clad to reason about the numerical stability of the given function and also analyze the sensitivity of the variables involved.
The idea behind this project is to develop benchmarks and improve the floating-point error estimation framework as necessary. Moreover, find compelling real-world use-cases of the tool and investigate the possibility of performing lossy compression with it.
On successful completion of the project, the framework should have a sufficiently large set of benchmarks and example usages. Moreover, the framework should be able to run the following code as expected:
#include <iostream>
#include "clad/Differentiator/Differentiator.h"
// Some complicated function made up of doubles.
double someFunc(double F1[], double F2[], double V3[], double COUP1, double COUP2)
{
double cI = 1;
double TMP3;
double TMP4;
TMP3 = (F1[2] * (F2[4] * (V3[2] + V3[5]) + F2[5] * (V3[3] + cI * (V3[4]))) +
F1[3] * (F2[4] * (V3[3] - cI * (V3[4])) + F2[5] * (V3[2] - V3[5])));
TMP4 = (F1[4] * (F2[2] * (V3[2] - V3[5]) - F2[3] * (V3[3] + cI * (V3[4]))) +
F1[5] * (F2[2] * (-V3[3] + cI * (V3[4])) + F2[3] * (V3[2] + V3[5])));
return (-1.) * (COUP2 * (+cI * (TMP3) + 2. * cI * (TMP4)) + cI * (TMP3 *
COUP1));
}
int main() {
auto df = clad::estimate_error(someFunc);
// This call should generate a report to decide
// which variables can be downcast to a float.
df.execute(args...);
}
The project consists of the following tasks:
The LHC smashes groups of protons together at close to the speed of light: 40 million times per second and with seven times the energy of the most powerful accelerators built up to now. Many of these will just be glancing blows but some will be head on collisions and very energetic. When this happens some of the energy of the collision is turned into mass and previously unobserved, short-lived particles – which could give clues about how Nature behaves at a fundamental level - fly out and into the detector. Our work includes the experimental discovery of the Higgs boson, which leads to the award of a Nobel prize for the underlying theory that predicted the Higgs boson as an important piece of the standard model theory of particle physics.
CMS is a particle detector that is designed to see a wide range of particles and phenomena produced in high-energy collisions in the LHC. Like a cylindrical onion, different layers of detectors measure the different particles, and use this key data to build up a picture of events at the heart of the collision. The CMSSW is a collection of software for the CMS experiment. It is responsible for the collection and processing of information about the particle collisions at the detector. CMSSW uses the ROOT framework to provide support for data storage and processing. ROOT relies on Cling, Clang, LLVM for building automatically efficient I/O representation of the necessary C++ objects. The I/O properties of each object is described in a compileable C++ file called a /dictionary/. ROOT’s I/O dictionary system relies on C++ modules to improve the overall memory footprint when being used.
The few run time failures in the modules integration builds of CMSSW are
due to dictionaries that can not be found in the modules system. These
dictionaries are present as the mainstream system is able to find them
using a broader search. The modules setup in ROOT needs to be extended to
include a dictionary extension to track dictionary<->module mappings for
C++ entities that introduce synonyms rather than declarations
(using std::vector<A<B>> = MyVector
where the dictionaries of A, B are
elsewhere)
The project consists of the following tasks:
using std::vector<A<B>> = MyVector
, we
should store the ODRHash of it in the respective dictionary file as a
number attached to a special variable which can be retrieved at symbol
scanning time.Cling is an interactive C++ interpreter, built on top of Clang and LLVM compiler infrastructure. Cling realizes the read-eval-print loop (REPL) concept, in order to leverage rapid application development. Implemented as a small extension to LLVM and Clang, the interpreter reuses their strengths such as the praised concise and expressive compiler diagnostics.
The project foresees to enhance the Github Actions infrastructure by adding development process automation tools:
codecov
)clang-tidy
)clang-format
)Cling is an interactive C++ interpreter, built on top of Clang and LLVM compiler infrastructure. Cling realizes the read-eval-print loop (REPL) concept, in order to leverage rapid application development. Implemented as a small extension to LLVM and Clang, the interpreter reuses their strengths such as the praised concise and expressive compiler diagnostics.
Since the development of Cling started, it got some new features to enable new workflows. One of the features is CUDA mode, which allows you to interactively develop and run CUDA code on Nvidia GPUs. Another feature is the redefinition of functions, variable classes and more, bypassing the one-definition rule of C++. This feature enables comfortable rapid prototyping in C++. Currently, the two features cannot be used together because parsing and executing CUDA code behaves differently compared to pure C++.
The task is to adapt the redefinitions feature of the pure C++ mode for the CUDA mode. To do this, the student must develop solutions to known and unknown problems that parsing and executing CUDA code causes.
The LHC smashes groups of protons together at close to the speed of light: 40 million times per second and with seven times the energy of the most powerful accelerators built up to now. Many of these will just be glancing blows but some will be head on collisions and very energetic. When this happens some of the energy of the collision is turned into mass and previously unobserved, short-lived particles – which could give clues about how Nature behaves at a fundamental level - fly out and into the detector. Our work includes the experimental discovery of the Higgs boson, which leads to the award of a Nobel prize for the underlying theory that predicted the Higgs boson as an important piece of the standard model theory of particle physics.
CMS is a particle detector that is designed to see a wide range of particles and phenomena produced in high-energy collisions in the LHC. Like a cylindrical onion, different layers of detectors measure the different particles, and use this key data to build up a picture of events at the heart of the collision.
Last year, thanks to Lucas Calmolezi and GSoC, the usage of boost in CMSSW was modernized. It improved the C++ modules support of local boost fork.
Many of the accumulated local patches add missing includes to the relevant boost header files. The candidate should start by proposing the existing patches to the boost community. Try to compile more boost-specific modules which is mostly a mechanical task. The student should be ready to work towards making the C++ module files more efficient containing less duplications. The student should be prepared to write a progress report and present the results.
The LLVM BuildingAJIT tutorial series teaches readers to build their own JIT class from scratch using LLVM’s ORC APIs, however the tutorial chapters have not kept pace with recent API improvements. Bring the existing tutorial chapters up to speed, write up a new chapter on lazy compilation (chapter code already available) or write a new chapter from scratch.
JITLink is LLVM’s new JIT linker API – the low-level API that transforms compiler output (relocatable object files) into ready-to-execute bytes in memory. To do this JITLink’s generic linker algorithm needs to be specialized to support the target object format (COFF, ELF, MachO), and architecture (arm, arm64, i386, x86-64). LLVM already has mature implementations of JITLink for MachO/arm64 and MachO/x86-64, and a relatively new implementation for ELF/x86-64. Write a JITLink implementation for a missing target that interests you. If you choose to implement support for a new architecture using the ELF or MachO formats then you will be able to re-use the existing generic code for these formats. If you want to implement support for a new target using the COFF format then you will need to write both the generic COFF support code and the architecture support code for your chosen architecture.
Write a JITLink specialization for a not-yet-supported format/architecture.