Open Projects

Design and Develop a CUDA engine working along with C/C++ mode in clang-repl

Description

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.

Our group puts efforts to incorporate and possibly redesign parts of Cling in Clang mainline through a new tool, clang-repl. The project aims to generalize the IncrementalCUDADeviceCompiler of cling and add this functionality in clang-repl.

Task ideas and expected results

There are several foreseen tasks:

  • Write a detailed request for comment (RFC) document on the design choices and gather feedback from the LLVM community.
  • Implement the necessary functionality to support existing test cases available here.
  • Develop clang-repl-based tutorials for the CUDA backend.
  • Investigate the requirements for supporting a HIP backend.
  • Demonstrate a CUDA-executed gradient computed by the Clad automatic differentiation plugin.
  • Present the work at the relevant meetings and conferences.

Implement libInterOp API exposing memory, ownership and thread safety information

Description

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.

Clang and LLVM provide access to C++ from other programming languages, but currently only exposes the declared public interfaces of such C++ code even when it has parsed implementation details directly. Both the high-level and the low-level program representation has enough information to capture and expose more of such details to improve language interoperability. Examples include details of memory management, ownership transfer, thread safety, externalized side-effects, etc. For example, if memory is allocated and returned, the caller needs to take ownership; if a function is pure, it can be elided; if a call provides access to a data member, it can be reduced to an address lookup. The goal of this project is to develop API for libInterOp which are capable of extracting and exposing such information AST or from JIT-ed code and use it in cppyy (Python-C++ language bindings) as an exemplar. If time permits, extend the work to persistify this information across translation units and use it on code compiled with Clang.

Task ideas and expected results

There are several foreseen tasks:

  • Collect and categorize possible exposed interop information kinds
  • Write one or more facilities to extract necessary implementation details
  • Design a language-independent interface to expose this information
  • Integrate the work in clang-repl and Cling
  • Implement and demonstrate its use in cppyy as an exemplar
  • Present the work at the relevant meetings and conferences.

Tutorial development with clang-repl

Description

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.

Our group puts efforts to incorporate and possibly redesign parts of Cling in Clang mainline through a new tool, clang-repl. The project aims implementing tutorials demonstrating the capabilities of the project and investigating adoption of clang-repl in xeus-cling.

Task ideas and expected results

There are several foreseen tasks:

  • Write several tutorials demostrating the current capabilities of clang-repl.
  • Investigate the requirements for adding clang-repl as a backend to xeus-cling.
  • Implement the xeus kernel protocol for clang-repl.
  • Complete a blog post about clang-repl and possibly Jupyter.
  • Present the work at the relevant meetings and conferences.

Implement autocompletion in clang-repl

Description

The Clang compiler is part of the LLVM compiler infrastructure and supports various languages such as C, C++, ObjC and ObjC++. The design of LLVM and Clang enables them to be used as libraries, and has led to the creation of an entire compiler-assisted ecosystem of tools. The relatively friendly codebase of Clang and advancements in the JIT infrastructure in LLVM further enable research into different methods for processing C++ by blurring the boundary between compile time and runtime. Challenges include incremental compilation and fitting compile/link time optimizations into a more dynamic environment.

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.

Our group puts efforts to incorporate and possibly redesign parts of Cling in Clang mainline through a new tool, clang-repl. The project aims at the design and implementation of robust autocompletion when users type C++ at the prompt of clang-repl. For example:

[clang-repl] class MyLongClassName {};
[clang-repl] My<tab>
// list of suggestions.

Task ideas and expected results

There are several foreseen tasks:

  • Research the current approaches for autocompletion in clang such as clang -code-completion-at=file:col1:col2.
  • Implement a version of the autocompletion support using the partial translation unit infrastructure in clang’s libInterpreter.
  • Investigate the requirements for semantic autocompletion which takes into account the exact grammar position and semantics of the code. Eg:
    [clang-repl] struct S {S* operator+(S&) { return nullptr;}};
    [clang-repl] S a, b;
    [clang-repl] v = a + <tab> // shows b as the only acceptable choice here.
    
  • Present the work at the relevant meetings and conferences

Implement vector mode in forward mode automatic differentiation in Clad

Description

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library is able to differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.

Vector mode support will facilitate the computation of gradients using the forward mode AD in a single pass and thus without explicitly performing differentiation n times for n function arguments. The major benefit of using vector mode is that computationally expensive operations do not need to be recomputed n times for n function arguments.

For example, if we want to compute df/dx and df/dy of a function f(x, y) using the forward mode AD in Clad, then currently we need to explicitly differentiate f two times. Vector mode will allow the generation of f_d(x, y) such that we will be able to get partial derivatives with respect to all the function arguments (gradient) in a single call.

After successful completion of the project the code snippet should work as expected:

#include <clad/Differentiator/Differentiator.h>
#include <iostream>

double someComputationalIntensiveFn();

double fn(double x, double y) {
  double t = someComputationalIntensiveFn(); // should be computed only once
                                             // in the derived function.
  double res = 2 * t * x + 3 * t * x * y;
  return t;
}

int main() {
  auto d_fn = clad::differentiate(fn, "arr");
  double d_x = 0, d_y = 0;
  d_fn.execute(3, 5, &d_x, &d_y);
  std::cout << "Derivative of fn wrt d_x: " << d_x << "\n";
  std::cout << "Derivative of fn wrt d_y: " << d_y << "\n";
}

Task ideas and expected results

The project consists of the following tasks:

  • Extend and generalize our ForwardModeVisitor to produce a single function with the directional derivatives.
  • Add a new mode to the top-level clad interface clad::differentiate for vector mode.
  • Extend the unit test coverage.
  • Develop tutorials and documentation.
  • Present the work at the relevant meetings and conferences.

Add support for differentiating with respect to multidimensional arrays (or pointers) in Clad.

Description

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library is able to differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.

Clad currently only supports differentiation with respect to single-dimensional arrays. Support for differentiation with respect to pointers is limited as well. This project aims to add support for multi-dimensional arrays (and pointers) in Clad.

After successful completion of the project the code snippet should work as expected:

#include <iostream>
#include "clad/Differentiator/Differentiator.h"

double fn(double arr[5][5]) {
  double res = 1 * arr[0][0] + 2 * arr[1][1] + 4 * arr[2][2];
  return res * 2;
}

int main() {
  auto d_fn = clad::gradient(fn);
  double arr[5][5] = {{1, 2, 3, 4, 5},
                      {6, 7, 8, 9, 10},
                      {11, 12, 13, 14, 15},
                      {16, 17, 18, 19, 20},
                      {21, 22, 23, 24, 25}};
  double d_arr[5][5] = {};
  d_fn.execute(arr, d_arr);
  std::cout << "Derivative of d_fn wrt arr[0][0]: " << d_arr[0][0] << "\n"; // 2
  std::cout << "Derivative of d_fn wrt arr[1][1]: " << d_arr[1][1] << "\n"; // 4
  return 0;
}

Task ideas and expected results

The project consists of the following tasks:

  • Add support for differentiation with respect to multidimensional arrays (and pointers) in the reverse mode.
  • Add support for differentiation with respect to multidimensional arrays (and pointers) in the forward mode.
  • Extend the unit test coverage.
  • Develop tutorials and documentation.
  • Present the work at the relevant meetings and conferences.

Broaden the Scope for the Floating-Point Error Estimation Framework in Clad

Description

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library can differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.

Clad also possesses the capabilities of annotating given source code with floating-point error estimation code. This allows Clad to compute any floating-point related errors in the given function on the fly. This allows Clad to reason about the numerical stability of the given function and also analyze the sensitivity of the variables involved.

The idea behind this project is to develop benchmarks and improve the floating-point error estimation framework as necessary. Moreover, find compelling real-world use-cases of the tool and investigate the possibility of performing lossy compression with it.

On successful completion of the project, the framework should have a sufficiently large set of benchmarks and example usages. Moreover, the framework should be able to run the following code as expected:

#include <iostream>
#include "clad/Differentiator/Differentiator.h"

// Some complicated function made up of doubles.
double someFunc(double F1[], double F2[], double V3[], double COUP1, double COUP2)
{
  double cI = 1;
  double TMP3;
  double TMP4;
  TMP3 = (F1[2] * (F2[4] * (V3[2] + V3[5]) + F2[5] * (V3[3] + cI * (V3[4]))) +
  F1[3] * (F2[4] * (V3[3] - cI * (V3[4])) + F2[5] * (V3[2] - V3[5])));
  TMP4 = (F1[4] * (F2[2] * (V3[2] - V3[5]) - F2[3] * (V3[3] + cI * (V3[4]))) +
  F1[5] * (F2[2] * (-V3[3] + cI * (V3[4])) + F2[3] * (V3[2] + V3[5])));
  return (-1.) * (COUP2 * (+cI * (TMP3) + 2. * cI * (TMP4)) + cI * (TMP3 *
  COUP1));
}

int main() {
  auto df = clad::estimate_error(someFunc);
  // This call should generate a report to decide
  // which variables can be downcast to a float.
  df.execute(args...);
}

Task ideas and expected results

The project consists of the following tasks:

  • Add at least 5 benchmarks and compare the framework’s correctness and performance against them.
  • Compile at least 3 real-world examples that are complex enough to demonstrate the capabilities of the framework.
  • Solve any general-purpose issues that come up with Clad during the process.
  • Prepare demos and carry out development needed for lossy compression.

Improve robustness of dictionary to module lookups in ROOT

Description

The LHC smashes groups of protons together at close to the speed of light: 40 million times per second and with seven times the energy of the most powerful accelerators built up to now. Many of these will just be glancing blows but some will be head on collisions and very energetic. When this happens some of the energy of the collision is turned into mass and previously unobserved, short-lived particles – which could give clues about how Nature behaves at a fundamental level - fly out and into the detector. Our work includes the experimental discovery of the Higgs boson, which leads to the award of a Nobel prize for the underlying theory that predicted the Higgs boson as an important piece of the standard model theory of particle physics.

CMS is a particle detector that is designed to see a wide range of particles and phenomena produced in high-energy collisions in the LHC. Like a cylindrical onion, different layers of detectors measure the different particles, and use this key data to build up a picture of events at the heart of the collision. The CMSSW is a collection of software for the CMS experiment. It is responsible for the collection and processing of information about the particle collisions at the detector. CMSSW uses the ROOT framework to provide support for data storage and processing. ROOT relies on Cling, Clang, LLVM for building automatically efficient I/O representation of the necessary C++ objects. The I/O properties of each object is described in a compileable C++ file called a /dictionary/. ROOT’s I/O dictionary system relies on C++ modules to improve the overall memory footprint when being used.

The few run time failures in the modules integration builds of CMSSW are due to dictionaries that can not be found in the modules system. These dictionaries are present as the mainstream system is able to find them using a broader search. The modules setup in ROOT needs to be extended to include a dictionary extension to track dictionary<->module mappings for C++ entities that introduce synonyms rather than declarations (using std::vector<A<B>> = MyVector where the dictionaries of A, B are elsewhere)

Task ideas and expected results

The project consists of the following tasks:

  • If an alias declaration of kind using std::vector<A<B>> = MyVector, we should store the ODRHash of it in the respective dictionary file as a number attached to a special variable which can be retrieved at symbol scanning time.
  • Track down the test failures of CMSSW and check if the proposed implementation works.
  • Develop tutorials and documentation.
  • Present the work at the relevant meetings and conferences.

Optimize ROOT use of modules for large codebases (eg, CMSSW)

Description

The LHC smashes groups of protons together at close to the speed of light: 40 million times per second and with seven times the energy of the most powerful accelerators built up to now. Many of these will just be glancing blows but some will be head on collisions and very energetic. When this happens some of the energy of the collision is turned into mass and previously unobserved, short-lived particles – which could give clues about how Nature behaves at a fundamental level - fly out and into the detector. Our work includes the experimental discovery of the Higgs boson, which leads to the award of a Nobel prize for the underlying theory that predicted the Higgs boson as an important piece of the standard model theory of particle physics.

CMS is a particle detector that is designed to see a wide range of particles and phenomena produced in high-energy collisions in the LHC. Like a cylindrical onion, different layers of detectors measure the different particles, and use this key data to build up a picture of events at the heart of the collision. The CMSSW is a collection of software for the CMS experiment. It is responsible for the collection and processing of information about the particle collisions at the detector. CMSSW uses the ROOT framework to provide support for data storage and processing. ROOT relies on Cling, Clang, LLVM for building automatically efficient I/O representation of the necessary C++ objects. The I/O properties of each object is described in a compileable C++ file called a /dictionary/. ROOT’s I/O dictionary system relies on C++ modules to improve the overall memory footprint when being used.

One source of performance loss is the need for symbol lookups across the very large set of CMSSW modules. ROOT needs to be improved to optimize this lookup so that it does not pull all modules defining namespace edm on edm::X lookups.

Task ideas and expected results

The project consists of the following tasks:

  • Develop an extension to the GlobalModuleIndex infrastructure in clang which keeps track of the DeclKind of the identifiers so that we can later ignore the identifiers that declare a namespace.
  • Track down the test failures of CMSSW and check if the proposed implementation works.
  • Develop tutorials and documentation.
  • Present the work at the relevant meetings and conferences.

Add initial integration of Clad with Enzyme

Description

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library is able to differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage. Enzyme is a prominent autodiff framework which works on LLVM IR.

Clad and Enzyme can be considered as a C++ frontend and a backend automatic differentiation framework. In many cases, when clad needs to fall back to numeric differentiation it can try configuring and using Enzyme to perform the automatic differentiation on lower level.

Task ideas and expected results

Understand how both systems work. Define the Enzyme configuration requirements and enable Clad to communicate efficiently with Enzyme. That may require several steps: start building and using the optimization pass of Enzyme as part of the Clad toolchain; use Enzyme for cross-validation derivative results; etc.

Improve Cling’s Development Lifecycle

Description

Cling is an interactive C++ interpreter, built on top of Clang and LLVM compiler infrastructure. Cling realizes the read-eval-print loop (REPL) concept, in order to leverage rapid application development. Implemented as a small extension to LLVM and Clang, the interpreter reuses their strengths such as the praised concise and expressive compiler diagnostics.

Task ideas and expected results

The project foresees to enhance the Github Actions infrastructure by adding development process automation tools:

  • Code Coverage information (codecov)
  • Static code analysis (clang-tidy)
  • Coding conventions checks (clang-format)
  • Release binary upload automation

Allow redefinition of CUDA functions in Cling

Description

Cling is an interactive C++ interpreter, built on top of Clang and LLVM compiler infrastructure. Cling realizes the read-eval-print loop (REPL) concept, in order to leverage rapid application development. Implemented as a small extension to LLVM and Clang, the interpreter reuses their strengths such as the praised concise and expressive compiler diagnostics.

Since the development of Cling started, it got some new features to enable new workflows. One of the features is CUDA mode, which allows you to interactively develop and run CUDA code on Nvidia GPUs. Another feature is the redefinition of functions, variable classes and more, bypassing the one-definition rule of C++. This feature enables comfortable rapid prototyping in C++. Currently, the two features cannot be used together because parsing and executing CUDA code behaves differently compared to pure C++.

Task ideas and expected results

The task is to adapt the redefinitions feature of the pure C++ mode for the CUDA mode. To do this, the student must develop solutions to known and unknown problems that parsing and executing CUDA code causes.

Developing C++ modules support in CMSSW and Boost

Description

The LHC smashes groups of protons together at close to the speed of light: 40 million times per second and with seven times the energy of the most powerful accelerators built up to now. Many of these will just be glancing blows but some will be head on collisions and very energetic. When this happens some of the energy of the collision is turned into mass and previously unobserved, short-lived particles – which could give clues about how Nature behaves at a fundamental level - fly out and into the detector. Our work includes the experimental discovery of the Higgs boson, which leads to the award of a Nobel prize for the underlying theory that predicted the Higgs boson as an important piece of the standard model theory of particle physics.

CMS is a particle detector that is designed to see a wide range of particles and phenomena produced in high-energy collisions in the LHC. Like a cylindrical onion, different layers of detectors measure the different particles, and use this key data to build up a picture of events at the heart of the collision.

Last year, thanks to Lucas Calmolezi and GSoC, the usage of boost in CMSSW was modernized. It improved the C++ modules support of local boost fork.

Task ideas and expected results

Many of the accumulated local patches add missing includes to the relevant boost header files. The candidate should start by proposing the existing patches to the boost community. Try to compile more boost-specific modules which is mostly a mechanical task. The student should be ready to work towards making the C++ module files more efficient containing less duplications. The student should be prepared to write a progress report and present the results.

Implement a shared-memory based JITLinkMemoryManager for out-of-process JITting

Description

LLVM’s JIT uses the JITLinkMemoryManager interface to allocate both working memory (where the JIT fixes up the relocatable objects produced by the compiler) and target memory (where the JIT’d code will reside in the target). JITLinkMemoryManager instances are also responsible for transporting fixed-up code from working memory to target memory. LLVM has an existing cross-process allocator that uses remote procedure calls (RPC) to allocate and copy bytes to the target process, however a more attractive solution (when the JIT and target process share the same physical memory) would be to use shared memory pages to avoid copies between processes.

Task ideas and expected results

Implement a shared-memory based JITLinkMemoryManager:

  • Write generic LLVM APIs for shared memory allocation.
  • Write a JITLinkMemoryManager that uses these generic APIs to allocate shared working-and-target memory.
  • Make an extensive performance study of the approach.

Modernize the LLVM “Building A JIT” tutorial series

Description

The LLVM BuildingAJIT tutorial series teaches readers to build their own JIT class from scratch using LLVM’s ORC APIs, however the tutorial chapters have not kept pace with recent API improvements. Bring the existing tutorial chapters up to speed, write up a new chapter on lazy compilation (chapter code already available) or write a new chapter from scratch.

Task ideas and expected results

  • Update chapter text for Chapters 1-3 – Easy, but offers a chance to get up-to-speed on the APIs.
  • Write chapter text for Chapter 4 – Chapter code is already available, but no chapter text exists yet.
  • Write a new chapter from scratch – E.g. How to write an out-of-process JIT, or how to directly manipulate the JIT’d instruction stream using the ObjectLinkingLayer::Plugin API.

Description

JITLink is LLVM’s new JIT linker API – the low-level API that transforms compiler output (relocatable object files) into ready-to-execute bytes in memory. To do this JITLink’s generic linker algorithm needs to be specialized to support the target object format (COFF, ELF, MachO), and architecture (arm, arm64, i386, x86-64). LLVM already has mature implementations of JITLink for MachO/arm64 and MachO/x86-64, and a relatively new implementation for ELF/x86-64. Write a JITLink implementation for a missing target that interests you. If you choose to implement support for a new architecture using the ELF or MachO formats then you will be able to re-use the existing generic code for these formats. If you want to implement support for a new target using the COFF format then you will need to write both the generic COFF support code and the architecture support code for your chosen architecture.

Task ideas and expected results

Write a JITLink specialization for a not-yet-supported format/architecture.

Extend clang AST to provide information for the type as written in template instantiations

Description

When instantiating a template, the template arguments are canonicalized before being substituted into the template pattern. Clang does not preserve type sugar when subsequently accessing members of the instantiation.

std::vector<std::string> vs;
int n = vs.front(); // bad diagnostic: [...] aka 'std::basic_string<char>' [...]

 template<typename T> struct Id { typedef T type; };
 Id<size_t>::type // just 'unsigned long', 'size_t' sugar has been lost

Clang should “re-sugar” the type when performing member access on a class template specialization, based on the type sugar of the accessed specialization. The type of vs.front() should be std::string, not std::basic_string<char, […]>.

Suggested design approach: add a new type node to represent template argument sugar, and implicitly create an instance of this node whenever a member of a class template specialization is accessed. When performing a single-step desugar of this node, lazily create the desugared representation by propagating the sugared template arguments onto inner type nodes (and in particular, replacing Subst*Parm nodes with the corresponding sugar). When printing the type for diagnostic purposes, use the annotated type sugar to print the type as originally written.

For good results, template argument deduction will also need to be able to deduce type sugar (and reconcile cases where the same type is deduced twice with different sugar).

Task ideas and expected results

Diagnostics preserve type sugar even when accessing members of a template specialization. T<unsigned long> and T<size_t> are still the same type and the same template instantiation, but T<unsigned long>::type single-step desugars to ‘unsigned long’ and T<size_t>::type single-step desugars to ‘size_t’.

Infrastructure: Improve Cling’s packaging system cpt

Description

Cling has a flexible tool which can build and package binaries. It is implemented in python.

Task ideas and expected results

There are several improvements that can be made to cpt:

  • Fix deb package creation
  • Rewrite parts of cpt
    • Use a if __name__ == "__main__" block as program execution starting point
    • No mutating global variables
    • Minimize use of subprocess
    • Making cpt flake8 compliant (flexible error/violation codes)
    • Revamp argument parser (Examine possibility of dependent arguments)