Open Projects

ROOT Superbuilds

ROOT is a framework for data processing, born at CERN, at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations. The ROOT software framework is foundational for the HEP ecosystem, providing capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses object-oriented concepts and build-time modules to layer between components. We believe additional layering formalisms will benefit ROOT and its users.

Currently, ROOT is built as all-in-one package. We are working to create a modular version of ROOT that provides a minimal base install of core features, then later add functionality via incremental builds. This requires introducing new layering mechanisms and extending the functionality of the existing ROOT package manager prototype.

Task ideas and expected results

  • Enhance the existing CMake build system rules to enable lazy building of packages
  • Bootstrap “RootBase”
  • Demonstrate “layered” lazy builds

Improving performance of BioDynaMo using ROOT C++ Modules

ROOT is a framework for data processing, born at CERN, at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations. The ROOT software framework is foundational for the HEP ecosystem, providing capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses object-oriented concepts and build-time modules to layer between components. We believe additional layering formalisms will benefit ROOT and its users.

BioDynaMo is an agent-based simulation platform that enables users to perform simulations of previously unachievable scale and complexity, making it possible to tackle challenging scientific research questions. The project has a wide range of applications in cancer research, epidemiology, and social sciences.

BioDynaMo incorporates ROOT for several crucial functionalities such as statistical analysis, random number generation, C++-based Jupyter notebooks, and IO. Some features rely on efficient reflection information about BioDynaMo’s and user-defined C++ classes. This project is about improving the performance of the reflection system by upgrading to C++ modules.

Task ideas and expected results

  • Rework the cmake rules to incorporate efficiently ROOT via FetchContent
  • Replace invocations of genreflex in favor of rootcling
  • Enable C++ modules in rootcling
  • Produce a comparison report

Using ROOT in the field of genome sequencing

ROOT is a framework for data processing, born at CERN, at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations. The ROOT software framework is foundational for the HEP ecosystem, providing capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses object-oriented concepts and build-time modules to layer between components. We believe additional layering formalisms will benefit ROOT and its users.

ROOT has broader scientific uses than the field of high energy physics. Several studies have shown promising applications of the ROOT I/O system in the field of genome sequencing. This project is about extending the developed capability in GeneROOT and understanding better the requirements of the field.

Task ideas and expected results

  • Reproduce the results from previous comparisons against the ROOT master
  • Investigate changing the compression strategies
  • Investigate different ROOT file splitting techniques
  • Produce a comparison report

Implement Differentiating of the Kokkos Framework

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library can differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.

The Kokkos C++ Performance Portability Ecosystem is a production level solution for writing modern C++ applications in a hardware agnostic way. It is part of the US Department of Energies Exascale Project – the leading effort in the US to prepare the HPC community for the next generation of super computing platforms. The Ecosystem consists of multiple libraries addressing the primary concerns for developing and maintaining applications in a portable way. The three main components are the Kokkos Core Programming Model, the Kokkos Kernels Math Libraries and the Kokkos Profiling and Debugging Tools.

The Kokkos framework is used in several domains including climate modeling where gradients are important part of the simulation process. This project aims at teaching Clad to differentiate Kokkos entities in a performance portable way

Task ideas and expected results

  • Implement common test cases for Kokkos in Clad
  • Add support for Kokkos functors
  • Add support for Kokkos lambdas
  • Incorporate the changes from the initial Kokkos PR
  • Enhance existing benchmarks demonstrating effectiveness of Clad for Kokkos
  • [Stretch goal] Performance benchmarks

Integrate a Large Language Model with the xeus-cpp Jupyter kernel

xeus-cpp is a Jupyter kernel for cpp based on the native implementation of the Jupyter protocol xeus. This enables users to write and execute C++ code interactively, seeing the results immediately. This REPL (read-eval-print-loop) nature allows rapid prototyping and iterations without the overhead of compiling and running separate C++ programs. This also achieves C++ and Python integration within a single Jupyter environment.

This project aims to integrate a large language model, such as Bard/Gemini, with the xeus-cpp Jupyter kernel. This integration will enable users to interactively generate and execute code in C++ leveraging the assistance of the language model. Upon successful integration, users will have access to features such as code autocompletion, syntax checking, semantic understanding, and even code generation based on natural language prompts.

Task ideas and expected results

  • Design and implement mechanisms to interface the large language model with the xeus-cpp kernel. Jupyter-AI might be used as a motivating example
  • Develop functionalities within the kernel to utilize the language model for code generation based on natural language descriptions and suggestions for autocompletion.
  • Comprehensive documentation and thorough testing/CI additions to ensure reliability.
  • [Stretch Goal] After achieving the previous milestones, the student can work on specializing the model for enhanced syntax and semantic understanding capabilities by using xeus notebooks as datasets.

Implementing missing features in xeus-cpp

xeus-cpp is a Jupyter kernel for cpp based on the native implementation of the Jupyter protocol xeus. This enables users to write and execute C++ code interactively, seeing the results immediately. This REPL (read-eval-print-loop) nature allows rapid prototyping and iterations without the overhead of compiling and running separate C++ programs. This also achieves C++ and Python integration within a single Jupyter environment.

The xeus-cpp is a successor of xeus-clang-repl and xeus-cling. The project goal is to advance the project feature support to the extent of what’s supported in xeus-clang-repl and xeus-cling.

Task ideas and expected results

  • Fix occasional bugs in clang-repl directly in llvm upstream
  • Implement the value printing logic
  • Advance the wasm infrastructure
  • Write tutorials and demonstrators
  • Complete the transition of xeus-clang-repl to xeus-cpp

Adoption of CppInterOp in ROOT

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment. The CppInterOp library provides a minimalist approach for other languages to identify C++ entities (variables, classes, etc.). This enables interoperability with C++ code, bringing the speed and efficiency of C++ to simpler, more interactive languages like Python. CppInterOp provides primitives that are good for providing reflection information.

The ROOT is an open-source data analysis framework used by high energy physics and others to analyze petabytes of data, scientifically. The framework provides support for data storage and processing by relying on Cling, Clang, LLVM for building automatically efficient I/O representation of the necessary C++ objects. The I/O properties of each object is described in a compilable C++ file called a /dictionary/. ROOT’s I/O dictionary system relies on reflection information provided by Cling and Clang. However, the reflection information system has grown organically and now ROOT’s core/metacling system has been hard to maintain and integrate.

The goal of this project is to integrate CppInterOp in ROOT where possible.

Task ideas and expected results

  • To achieve this goal we expect several infrastructure items to be completed such as Windows support, WASM support
  • Make reusable github actions across multiple repositories
  • Sync the state of the dynamic library manager with the one in ROOT
  • Sync the state of callfunc/jitcall with the one in ROOT
  • Prepare the infrastructure for upstreaming to llvm
  • Propose an RFC and make a presentation to the ROOT development team

Implement CppInterOp API exposing memory, ownership and thread safety information

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.

Clang and LLVM provide access to C++ from other programming languages, but currently only exposes the declared public interfaces of such C++ code even when it has parsed implementation details directly. Both the high-level and the low-level program representation has enough information to capture and expose more of such details to improve language interoperability. Examples include details of memory management, ownership transfer, thread safety, externalized side-effects, etc. For example, if memory is allocated and returned, the caller needs to take ownership; if a function is pure, it can be elided; if a call provides access to a data member, it can be reduced to an address lookup.

The goal of this project is to develop API for CppInterOp which are capable of extracting and exposing such information AST or from JIT-ed code and use it in cppyy (Python-C++ language bindings) as an exemplar. If time permits, extend the work to persistify this information across translation units and use it on code compiled with Clang.

Task ideas and expected results

  • Collect and categorize possible exposed interop information kinds
  • Write one or more facilities to extract necessary implementation details
  • Design a language-independent interface to expose this information
  • Integrate the work in clang-repl and Cling
  • Implement and demonstrate its use in cppyy as an exemplar
  • Present the work at the relevant meetings and conferences.

Implement and improve an efficient, layered tape with prefetching capabilities

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library can differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.

The most heavily used entity in AD is a stack-like data structure called a tape. For example, the first-in last-out access pattern, which naturally occurs in the storage of intermediate values for reverse mode AD, lends itself towards asynchronous storage. Asynchronous prefetching of values during the reverse pass allows checkpoints deeper in the stack to be stored furthest away in the memory hierarchy. Checkpointing provides a mechanism to parallelize segments of a function that can be executed on independent cores. Inserting checkpoints in these segments using separate tapes enables keeping the memory local and not sharing memory between cores. We will research techniques for local parallelization of the gradient reverse pass, and extend it to achieve better scalability and/or lower constant overheads on CPUs and potentially accelerators. We will evaluate techniques for efficient memory use, such as multi-level checkpointing support. Combining already developed techniques will allow executing gradient segments across different cores or in heterogeneous computing systems. These techniques must be robust and user-friendly, and minimize required application code and build system changes.

This project aims to improve the efficiency of the clad tape and generalize it into a tool-agnostic facility that could be used outside of clad as well.

Task ideas and expected results

  • Optimize the current tape by avoiding re-allocating on resize in favor of using connected slabs of array
  • Enhance existing benchmarks demonstrating the efficiency of the new tape
  • Add the tape thread safety
  • Implement multilayer tape being stored in memory and on disk
  • [Stretch goal] Support cpu-gpu transfer of the tape
  • [Stretch goal] Add infrastructure to enable checkpointing offload to the new tape
  • [Stretch goal] Performance benchmarks

Enabling CUDA compilation on Cppyy-Numba generated IR

Cppyy is an automatic, run-time, Python-C++ bindings generator, for calling C++ from Python and Python from C++. Initial support has been added that allows Cppyy to hook into the high-performance Python compiler, Numba which compiles looped code containing C++ objects/methods/functions defined via Cppyy into fast machine code. Since Numba compiles the code in loops into machine code it crosses the language barrier just once and avoids large slowdowns accumulating from repeated calls between the two languages. Numba uses its own lightweight version of the LLVM compiler toolkit (llvmlite) that generates an intermediate code representation (LLVM IR) which is also supported by the Clang compiler capable of compiling CUDA C++ code.

The project aims to demonstrate Cppyy’s capability to provide CUDA paradigms to Python users without any compromise in performance. Upon successful completion a possible proof-of-concept can be expected in the below code snippet -

import cppyy
import cppyy.numba_ext

cppyy.cppdef('''
__global__ void MatrixMul(float* A, float* B, float* out) {
    // kernel logic for matrix multiplication
}
''')

@numba.njit
def run_cuda_mul(A, B, out):
    # Allocate memory for input and output arrays on GPU
    # Define grid and block dimensions
    # Launch the kernel
    MatrixMul[griddim, blockdim](d_A, d_B, d_out)	

Task ideas and expected results

  • Add support for declaration and parsing of Cppyy-defined CUDA code on the Numba extension.
  • Design and develop a CUDA compilation and execution mechanism.
  • Prepare proper tests and documentation.

Cppyy STL/Eigen - Automatic conversion and plugins for Python based ML-backends

Cppyy is an automatic, run-time, Python-C++ bindings generator, for calling C++ from Python and Python from C++. Cppyy uses pythonized wrappers of useful classes from libraries like STL and Eigen that allow the user to utilize them on the Python side. Current support follows container types in STL like std::vector, std::map, and std::tuple and the Matrix-based classes in Eigen/Dense. These cppyy objects can be plugged into idiomatic expressions that expect Python builtin-types. This behaviour is achieved by growing pythonistic methods like __len__ while also retaining its C++ methods like size.

Efficient and automatic conversion between C++ and Python is essential towards high-performance cross-language support. This approach eliminates overheads arising from iterative initialization such as comma insertion in Eigen. This opens up new avenues for the utilization of Cppyy’s bindings in tools that perform numerical operations for transformations, or optimization.

The on-demand C++ infrastructure wrapped by idiomatic Python enables new techniques in ML tools like JAX/CUTLASS. This project allows the C++ infrastructure to be plugged into at service to the users seeking high-performance library primitives that are unavailable in Python.

Task ideas and expected results

  • Extend STL support for std::vectors of arbitrary dimensions
  • Improve the initialization approach for Eigen classes
  • Develop a streamlined interconversion mechanism between Python builtin-types, numpy.ndarray, and STL/Eigen data structures
  • Implement experimental plugins that perform basic computational
    operations in frameworks like JAX
  • Work on integrating these plugins with toolkits like CUTLASS that utilise the bindings to provide a Python API

Improve the LLVM.org Website Look and Feel

The llvm.org website serves as the central hub for information about the LLVM project, encompassing project details, current events, and relevant resources. Over time, the website has evolved organically, prompting the need for a redesign to enhance its modernity, structure, and ease of maintenance.

The goal of this project is to create a contemporary and coherent static website that reflects the essence of LLVM.org. This redesign aims to improve navigation, taxonomy, content discoverability, and overall usability. Given the critical role of the website in the community, efforts will be made to engage with community members, seeking consensus on the proposed changes.

LLVM’s current website is a complicated mesh of uncoordinated pages with inconsistent, static links pointing to both internal and external sources. The website has grown substantially and haphazardly since its inception.

It requires a major UI and UX overhaul to be able to better serve the LLVM community.

Based on a preliminary site audit, following are some of the problem areas that need to be addressed.

Sub-Sites: Many of the sections/sub-sites have a completely different UI/UX (e.g., main, clang, lists, foundation, circt, lnt, and docs). Sub-sites are divided into 8 separate repos and use different technologies including Hugo, Jekyll, etc.

Navigation: On-page navigation is inconsistent and confusing. Cross-sub-site navigation is inconsistent, unintuitive, and sometimes non-existent. Important subsections often depend on static links within (seemingly random) pages. Multi-word menu items are center-aligned and flow out of margins.

Pages: Many large write-ups lack pagination, section boundaries, etc., making them seem more intimidating than they really are. Several placeholder pages re-route to 3rd party services, adding bloat and inconsistency.

Search: Search options are placed in unintuitive locations, like the bottom of the side panel, or from static links to redundant pages. Some pages have no search options at all. With multiple sections of the website hosted in separate projects/repos, cross-sub-site search doesn’t seem possible.

Expected results: A modern, coherent-looking website that attracts new prospect users and empowers the existing community with better navigation, taxonomy, content discoverability, and overall usability. It should also include a more descriptive Contribution Guide (example) to help novice contributors, as well as to help maintain a coherent site structure.

Since the website is a critical infrastructure and most of the community will have an opinion this project should try to engage with the community building community consensus on the steps being taken.

Task ideas and expected results

  • Conduct a comprehensive content audit of the existing website.
  • Select appropriate technologies, preferably static site generators like Hugo or Jekyll.
  • Advocate for a separation of data and visualization, utilizing formats such as YAML and Markdown to facilitate content management without direct HTML coding.
  • Present three design mockups for the new website, fostering open discussions and allowing time for alternative proposals from interested parties.
  • Implement the chosen design, incorporating valuable feedback from the community.
  • Collaborate with content creators to integrate or update content as needed.

The successful candidate should commit to regular participation in weekly meetings, deliver presentations, and contribute blog posts as requested. Additionally, they should demonstrate the ability to navigate the community process with patience and understanding.

On Demand Parsing in Clang

Clang, like any C++ compiler, parses a sequence of characters as they appear, linearly. The linear character sequence is then turned into tokens and AST before lowering to machine code. In many cases the end-user code uses a small portion of the C++ entities from the entire translation unit but the user still pays the price for compiling all of the redundancies.

This project proposes to process the heavy compiling C++ entities upon using them rather than eagerly. This approach is already adopted in Clang’s CodeGen where it allows Clang to produce code only for what is being used. On demand compilation is expected to significantly reduce the compilation peak memory and improve the compile time for translation units which sparsely use their contents. In addition, that would have a significant impact on interactive C++ where header inclusion essentially becomes a no-op and entities will be only parsed on demand.

The Cling interpreter implements a very naive but efficient cross-translation unit lazy compilation optimization which scales across hundreds of libraries in the field of high-energy physics.

  // A.h
  #include <string>
  #include <vector>
  template <class T, class U = int> struct AStruct {
    void doIt() { /*...*/ }
    const char* data;
    // ...
  };

  template<class T, class U = AStruct<T>>
  inline void freeFunction() { /* ... */ }
  inline void doit(unsigned N = 1) { /* ... */ }

  // Main.cpp
  #include "A.h"
  int main() {
    doit();
    return 0;
  }

This pathological example expands to 37253 lines of code to process. Cling builds an index (it calls it an autoloading map) where it contains only forward declarations of these C++ entities. Their size is 3000 lines of code.

The index looks like:

  // A.h.index
  namespace std{inline namespace __1{template <class _Tp, class _Allocator> class __attribute__((annotate("$clingAutoload$vector")))  __attribute__((annotate("$clingAutoload$A.h")))  __vector_base;
    }}
  ...
  template <class T, class U = int> struct __attribute__((annotate("$clingAutoload$A.h"))) AStruct;

Upon requiring the complete type of an entity, Cling includes the relevant header file to get it. There are several trivial workarounds to deal with default arguments and default template arguments as they now appear on the forward declaration and then the definition. You can read more here.

Although the implementation could not be called a reference implementation, it shows that the Parser and the Preprocessor of Clang are relatively stateless and can be used to process character sequences which are not linear in their nature. In particular namespace-scope definitions are relatively easy to handle and it is not very difficult to return to namespace-scope when we lazily parse something. For other contexts such as local classes we will have lost some essential information such as name lookup tables for local entities. However, these cases are probably not very interesting as the lazy parsing granularity is probably worth doing only for top-level entities.

Such implementation can help with already existing issues in the standard such as CWG2335, under which the delayed portions of classes get parsed immediately when they’re first needed, if that first usage precedes the end of the class. That should give good motivation to upstream all the operations needed to return to an enclosing scope and parse something.

Implementation approach:

Upon seeing a tag definition during parsing we could create a forward declaration, record the token sequence and mark it as a lazy definition. Later upon complete type request, we could re-position the parser to parse the definition body. We already skip some of the template specializations in a similar way [commit, commit].

Another approach is every lazy parsed entity to record its token stream and change the Toks stored on LateParsedDeclarations to optionally refer to a subsequence of the externally-stored token sequence instead of storing its own sequence (or maybe change CachedTokens so it can do that transparently). One of the challenges would be that we currently modify the cached tokens list to append an “eof” token, but it should be possible to handle that in a different way.

In some cases, a class definition can affect its surrounding context in a few ways you’ll need to be careful about here:

1) struct X appearing inside the class can introduce the name X into the enclosing context.

2) static inline declarations can introduce global variables with non-constant initializers that may have arbitrary side-effects.

For point (2), there’s a more general problem: parsing any expression can trigger a template instantiation of a class template that has a static data member with an initializer that has side-effects. Unlike the above two cases, I don’t think there’s any way we can correctly detect and handle such cases by some simple analysis of the token stream; actual semantic analysis is required to detect such cases. But perhaps if they happen only in code that is itself unused, it wouldn’t be terrible for Clang to have a language mode that doesn’t guarantee that such instantiations actually happen.

Alternative and more efficient implementation could be to make the lookup tables range based but we do not have even a prototype proving this could be a feasible approach.

Task ideas and expected results

  • Design and implementation of on-demand compilation for non-templated functions
  • Support non-templated structs and classes
  • Run performance benchmarks on relevant codebases and prepare report
  • Prepare a community RFC document
  • [Stretch goal] Support templates

The successful candidate should commit to regular participation in weekly meetings, deliver presentations, and contribute blog posts as requested. Additionally, they should demonstrate the ability to navigate the community process with patience and understanding.

Improve automatic differentiation of object-oriented paradigms using Clad

Clad is an automatic differentiation (AD) clang plugin for C++. Given a C++ source code of a mathematical function, it can automatically generate C++ code for computing derivatives of the function. Clad has found uses in statistical analysis and uncertainty assessment applications.

Object oriented paradigms (OOP) provide a structured approach for complex use cases, allowing for modular components that can be reused & extended. OOP also allows for abstraction which makes code easier to reason about & maintain. Gaining full OOP support is an open research area for automatic differentiation codes.

This project focuses on improving support for differentiating object-oriented constructs in Clad. This will allow users to seamlessly compute derivatives to the algorithms in their projects which use an object-oriented model. C++ object-oriented constructs include but are not limited to: classes, inheritance, polymorphism, and related features such as operator overloading.

Task ideas and expected results

There are several foreseen tasks:

  • Study the current object-oriented differentiable programming support in Clad. Prepare a report of missing constructs that should be added to support the automatic differentiation of object-oriented paradigms in both the forward mode AD and the reverse mode AD. Some of the missing constructs are: differentiation of constructors, limited support for differentiation of operator overloads, reference class members, and no way of specifying custom derivatives for constructors.
  • Add support for the missing constructs.
  • Add proper tests and documentation.

Enable reverse-mode automatic differentiation of (CUDA) GPU kernels using Clad

Clad is an automatic differentiation (AD) clang plugin for C++. Given a C++ source code of a mathematical function, it can automatically generate C++ code for computing derivatives of the function. Clad has found uses in statistical analysis and uncertainty assessment applications. In scientific computing and machine learning, GPU multiprocessing can provide a significant boost in performance and scalability. This project focuses on enabling the automatic differentiation of CUDA GPU kernels using Clad. This will allow users to take advantage of the power of GPUs while benefiting from the accuracy and speed of automatic differentiation.

Task ideas and expected results

There are several foreseen tasks:

  • Research about automatic differentiation of code involving CUDA GPU kernels. Prepare a report and an initial strategy to follow.This may involve brainstorming and the need for innovative solutions.
  • Enable reverse-mode automatic differentiation of CUDA GPU kernels and calls to CUDA GPU kernels from the host code.
  • Add proper tests and documentation.

Broaden the Scope for the Floating-Point Error Estimation Framework in Clad

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library can differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.

Clad also possesses the capabilities of annotating given source code with floating-point error estimation code. This allows Clad to compute any floating-point related errors in the given function on the fly. This allows Clad to reason about the numerical stability of the given function and also analyze the sensitivity of the variables involved.

The idea behind this project is to develop benchmarks and improve the floating-point error estimation framework as necessary. Moreover, find compelling real-world use-cases of the tool and investigate the possibility of performing lossy compression with it.

On successful completion of the project, the framework should have a sufficiently large set of benchmarks and example usages. Moreover, the framework should be able to run the following code as expected:

#include <iostream>
#include "clad/Differentiator/Differentiator.h"

// Some complicated function made up of doubles.
double someFunc(double F1[], double F2[], double V3[], double COUP1, double COUP2)
{
  double cI = 1;
  double TMP3;
  double TMP4;
  TMP3 = (F1[2] * (F2[4] * (V3[2] + V3[5]) + F2[5] * (V3[3] + cI * (V3[4]))) +
  F1[3] * (F2[4] * (V3[3] - cI * (V3[4])) + F2[5] * (V3[2] - V3[5])));
  TMP4 = (F1[4] * (F2[2] * (V3[2] - V3[5]) - F2[3] * (V3[3] + cI * (V3[4]))) +
  F1[5] * (F2[2] * (-V3[3] + cI * (V3[4])) + F2[3] * (V3[2] + V3[5])));
  return (-1.) * (COUP2 * (+cI * (TMP3) + 2. * cI * (TMP4)) + cI * (TMP3 *
  COUP1));
}

int main() {
  auto df = clad::estimate_error(someFunc);
  // This call should generate a report to decide
  // which variables can be downcast to a float.
  df.execute(args...);
}

Task ideas and expected results

The project consists of the following tasks:

  • Add at least 5 benchmarks and compare the framework’s correctness and performance against them.
  • Compile at least 3 real-world examples that are complex enough to demonstrate the capabilities of the framework.
  • Solve any general-purpose issues that come up with Clad during the process.
  • Prepare demos and carry out development needed for lossy compression.

Add support for consteval and constexpr functions in clad

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library can differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.

C++ provides the specifiers consteval and constexpr to allow compile time evaluation of functions. constexpr declares a possibility, i.e the function will be evaluated at compile time if possible, else at runtime; whereas consteval makes it mandatory, i.e every call to the function must produce a compile-time constant.

The aim of this project is to ensure that same semantics are followed by the generated derivative function, i.e if the primal function is evaluated at compile time (because of constexpr or consteval specifier), then the generated derivative code should also have the same specifier to be evaluatable at compile time.

This will enable clad to demonstrate the benefits of doing automatic differentiation directly on C++ frontend to utilize the benefits of clang’s infrastructure.

After successful completion of the project the code snippet should work as expected:

#include <cstdio>
#include "clad/Differentiator/Differentiator.h"

constexpr double sq(double x) { return x*x; }

consteval double fn(double x, double y, double z) {
  double res = sq(x) + sq(y) + sq(z);
  return res;
}

int main() {
  auto d_fn = clad::gradient(fn);
  double dx = 0, dy = 0, dz = 0;
  d_fn.execute(3, 4, 5, &dx, &dy, &dz);
  printf("Gradient vector: [%.2f, %.2f, %.2f]", dx, dy, dz);
  return 0;
}

Task ideas and expected results

The project consists of the following tasks:

  • Add support for differentiation with respect to consteval and constexpr functions in the forward mode.
  • Add support for differentiation with respect to consteval and constexpr functions in the reverse mode.
  • Extend the unit test coverage.
  • Develop tutorials and documentation.
  • Present the work at the relevant meetings and conferences.

Improve robustness of dictionary to module lookups in ROOT

The LHC smashes groups of protons together at close to the speed of light: 40 million times per second and with seven times the energy of the most powerful accelerators built up to now. Many of these will just be glancing blows but some will be head on collisions and very energetic. When this happens some of the energy of the collision is turned into mass and previously unobserved, short-lived particles – which could give clues about how Nature behaves at a fundamental level - fly out and into the detector. Our work includes the experimental discovery of the Higgs boson, which leads to the award of a Nobel prize for the underlying theory that predicted the Higgs boson as an important piece of the standard model theory of particle physics.

CMS is a particle detector that is designed to see a wide range of particles and phenomena produced in high-energy collisions in the LHC. Like a cylindrical onion, different layers of detectors measure the different particles, and use this key data to build up a picture of events at the heart of the collision. The CMSSW is a collection of software for the CMS experiment. It is responsible for the collection and processing of information about the particle collisions at the detector. CMSSW uses the ROOT framework to provide support for data storage and processing. ROOT relies on Cling, Clang, LLVM for building automatically efficient I/O representation of the necessary C++ objects. The I/O properties of each object is described in a compileable C++ file called a /dictionary/. ROOT’s I/O dictionary system relies on C++ modules to improve the overall memory footprint when being used.

The few run time failures in the modules integration builds of CMSSW are due to dictionaries that can not be found in the modules system. These dictionaries are present as the mainstream system is able to find them using a broader search. The modules setup in ROOT needs to be extended to include a dictionary extension to track dictionary<->module mappings for C++ entities that introduce synonyms rather than declarations (using std::vector<A<B>> = MyVector where the dictionaries of A, B are elsewhere)

Task ideas and expected results

The project consists of the following tasks:

  • If an alias declaration of kind using std::vector<A<B>> = MyVector, we should store the ODRHash of it in the respective dictionary file as a number attached to a special variable which can be retrieved at symbol scanning time.
  • Track down the test failures of CMSSW and check if the proposed implementation works.
  • Develop tutorials and documentation.
  • Present the work at the relevant meetings and conferences.

Enhance the incremental compilation error recovery in clang and clang-repl

The Clang compiler is part of the LLVM compiler infrastructure and supports various languages such as C, C++, ObjC and ObjC++. The design of LLVM and Clang enables them to be used as libraries, and has led to the creation of an entire compiler-assisted ecosystem of tools. The relatively friendly codebase of Clang and advancements in the JIT infrastructure in LLVM further enable research into different methods for processing C++ by blurring the boundary between compile time and runtime. Challenges include incremental compilation and fitting compile/link time optimizations into a more dynamic environment.

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.

Our group puts efforts to incorporate and possibly redesign parts of Cling in Clang mainline through a new tool, clang-repl. The project aims at enhancing the error recovery when users type C++ at the prompt of clang-repl.

Task ideas and expected results

There are several tasks to improve the current rudimentary state of the error recovery:

  • Extend the test coverage for error recovery
  • Find and fix cases where there are bugs
  • Implement template instantiation error recovery support
  • Implement argument-dependent lookup (ADL) recovery support

Improve Cling’s Development Lifecycle

Cling is an interactive C++ interpreter, built on top of Clang and LLVM compiler infrastructure. Cling realizes the read-eval-print loop (REPL) concept, in order to leverage rapid application development. Implemented as a small extension to LLVM and Clang, the interpreter reuses their strengths such as the praised concise and expressive compiler diagnostics.

Task ideas and expected results

The project foresees to enhance the Github Actions infrastructure by adding development process automation tools:

  • Code Coverage information (codecov)
  • Static code analysis (clang-tidy)
  • Coding conventions checks (clang-format)
  • Release binary upload automation

Allow redefinition of CUDA functions in Cling

Cling is an interactive C++ interpreter, built on top of Clang and LLVM compiler infrastructure. Cling realizes the read-eval-print loop (REPL) concept, in order to leverage rapid application development. Implemented as a small extension to LLVM and Clang, the interpreter reuses their strengths such as the praised concise and expressive compiler diagnostics.

Since the development of Cling started, it got some new features to enable new workflows. One of the features is CUDA mode, which allows you to interactively develop and run CUDA code on Nvidia GPUs. Another feature is the redefinition of functions, variable classes and more, bypassing the one-definition rule of C++. This feature enables comfortable rapid prototyping in C++. Currently, the two features cannot be used together because parsing and executing CUDA code behaves differently compared to pure C++.

Task ideas and expected results

The task is to adapt the redefinitions feature of the pure C++ mode for the CUDA mode. To do this, the student must develop solutions to known and unknown problems that parsing and executing CUDA code causes.

Developing C++ modules support in CMSSW and Boost

The LHC smashes groups of protons together at close to the speed of light: 40 million times per second and with seven times the energy of the most powerful accelerators built up to now. Many of these will just be glancing blows but some will be head on collisions and very energetic. When this happens some of the energy of the collision is turned into mass and previously unobserved, short-lived particles – which could give clues about how Nature behaves at a fundamental level - fly out and into the detector. Our work includes the experimental discovery of the Higgs boson, which leads to the award of a Nobel prize for the underlying theory that predicted the Higgs boson as an important piece of the standard model theory of particle physics.

CMS is a particle detector that is designed to see a wide range of particles and phenomena produced in high-energy collisions in the LHC. Like a cylindrical onion, different layers of detectors measure the different particles, and use this key data to build up a picture of events at the heart of the collision.

Last year, thanks to Lucas Calmolezi and GSoC, the usage of boost in CMSSW was modernized. It improved the C++ modules support of local boost fork.

Task ideas and expected results

Many of the accumulated local patches add missing includes to the relevant boost header files. The candidate should start by proposing the existing patches to the boost community. Try to compile more boost-specific modules which is mostly a mechanical task. The student should be ready to work towards making the C++ module files more efficient containing less duplications. The student should be prepared to write a progress report and present the results.

Modernize the LLVM “Building A JIT” tutorial series

The LLVM BuildingAJIT tutorial series teaches readers to build their own JIT class from scratch using LLVM’s ORC APIs, however the tutorial chapters have not kept pace with recent API improvements. Bring the existing tutorial chapters up to speed, write up a new chapter on lazy compilation (chapter code already available) or write a new chapter from scratch.

Task ideas and expected results

  • Update chapter text for Chapters 1-3 – Easy, but offers a chance to get up-to-speed on the APIs.
  • Write chapter text for Chapter 4 – Chapter code is already available, but no chapter text exists yet.
  • Write a new chapter from scratch – E.g. How to write an out-of-process JIT, or how to directly manipulate the JIT’d instruction stream using the ObjectLinkingLayer::Plugin API.