Open Projects

Using ROOT in the field of genome sequencing

ROOT is a framework for data processing, born at CERN, at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations. The ROOT software framework is foundational for the HEP ecosystem, providing capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses object-oriented concepts and build-time modules to layer between components. We believe additional layering formalisms will benefit ROOT and its users.

ROOT has broader scientific uses than the field of high energy physics. Several studies have shown promising applications of the ROOT I/O system in the field of genome sequencing. This project is about extending the developed capability in GeneROOT and understanding better the requirements of the field.

Task ideas and expected results

  • Reproduce the results from previous comparisons against the ROOT master
  • Investigate changing the compression strategies
  • Investigate different ROOT file splitting techniques
  • Produce a comparison report

Implementing missing features in xeus-cpp

xeus-cpp is a Jupyter kernel for cpp based on the native implementation of the Jupyter protocol xeus. This enables users to write and execute C++ code interactively, seeing the results immediately. This REPL (read-eval-print-loop) nature allows rapid prototyping and iterations without the overhead of compiling and running separate C++ programs. This also achieves C++ and Python integration within a single Jupyter environment.

The xeus-cpp is a successor of xeus-clang-repl and xeus-cling. The project goal is to advance the project feature support to the extent of what’s supported in xeus-clang-repl and xeus-cling.

Task ideas and expected results

  • Fix occasional bugs in clang-repl directly in llvm upstream
  • Implement the value printing logic
  • Advance the wasm infrastructure
  • Write tutorials and demonstrators
  • Complete the transition of xeus-clang-repl to xeus-cpp

Adoption of CppInterOp in ROOT

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment. The CppInterOp library provides a minimalist approach for other languages to identify C++ entities (variables, classes, etc.). This enables interoperability with C++ code, bringing the speed and efficiency of C++ to simpler, more interactive languages like Python. CppInterOp provides primitives that are good for providing reflection information.

The ROOT is an open-source data analysis framework used by high energy physics and others to analyze petabytes of data, scientifically. The framework provides support for data storage and processing by relying on Cling, Clang, LLVM for building automatically efficient I/O representation of the necessary C++ objects. The I/O properties of each object is described in a compilable C++ file called a /dictionary/. ROOT’s I/O dictionary system relies on reflection information provided by Cling and Clang. However, the reflection information system has grown organically and now ROOT’s core/metacling system has been hard to maintain and integrate.

The goal of this project is to integrate CppInterOp in ROOT where possible.

Task ideas and expected results

  • To achieve this goal we expect several infrastructure items to be completed such as Windows support, WASM support
  • Make reusable github actions across multiple repositories
  • Sync the state of the dynamic library manager with the one in ROOT
  • Sync the state of callfunc/jitcall with the one in ROOT
  • Prepare the infrastructure for upstreaming to llvm
  • Propose an RFC and make a presentation to the ROOT development team

Implement CppInterOp API exposing memory, ownership and thread safety information

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.

Clang and LLVM provide access to C++ from other programming languages, but currently only exposes the declared public interfaces of such C++ code even when it has parsed implementation details directly. Both the high-level and the low-level program representation has enough information to capture and expose more of such details to improve language interoperability. Examples include details of memory management, ownership transfer, thread safety, externalized side-effects, etc. For example, if memory is allocated and returned, the caller needs to take ownership; if a function is pure, it can be elided; if a call provides access to a data member, it can be reduced to an address lookup.

The goal of this project is to develop API for CppInterOp which are capable of extracting and exposing such information AST or from JIT-ed code and use it in cppyy (Python-C++ language bindings) as an exemplar. If time permits, extend the work to persistify this information across translation units and use it on code compiled with Clang.

Task ideas and expected results

  • Collect and categorize possible exposed interop information kinds
  • Write one or more facilities to extract necessary implementation details
  • Design a language-independent interface to expose this information
  • Integrate the work in clang-repl and Cling
  • Implement and demonstrate its use in cppyy as an exemplar
  • Present the work at the relevant meetings and conferences.

Implement and improve an efficient, layered tape with prefetching capabilities

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library can differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.

The most heavily used entity in AD is a stack-like data structure called a tape. For example, the first-in last-out access pattern, which naturally occurs in the storage of intermediate values for reverse mode AD, lends itself towards asynchronous storage. Asynchronous prefetching of values during the reverse pass allows checkpoints deeper in the stack to be stored furthest away in the memory hierarchy. Checkpointing provides a mechanism to parallelize segments of a function that can be executed on independent cores. Inserting checkpoints in these segments using separate tapes enables keeping the memory local and not sharing memory between cores. We will research techniques for local parallelization of the gradient reverse pass, and extend it to achieve better scalability and/or lower constant overheads on CPUs and potentially accelerators. We will evaluate techniques for efficient memory use, such as multi-level checkpointing support. Combining already developed techniques will allow executing gradient segments across different cores or in heterogeneous computing systems. These techniques must be robust and user-friendly, and minimize required application code and build system changes.

This project aims to improve the efficiency of the clad tape and generalize it into a tool-agnostic facility that could be used outside of clad as well.

Task ideas and expected results

  • Optimize the current tape by avoiding re-allocating on resize in favor of using connected slabs of array
  • Enhance existing benchmarks demonstrating the efficiency of the new tape
  • Add the tape thread safety
  • Implement multilayer tape being stored in memory and on disk
  • [Stretch goal] Support cpu-gpu transfer of the tape
  • [Stretch goal] Add infrastructure to enable checkpointing offload to the new tape
  • [Stretch goal] Performance benchmarks

On Demand Parsing in Clang

Clang, like any C++ compiler, parses a sequence of characters as they appear, linearly. The linear character sequence is then turned into tokens and AST before lowering to machine code. In many cases the end-user code uses a small portion of the C++ entities from the entire translation unit but the user still pays the price for compiling all of the redundancies.

This project proposes to process the heavy compiling C++ entities upon using them rather than eagerly. This approach is already adopted in Clang’s CodeGen where it allows Clang to produce code only for what is being used. On demand compilation is expected to significantly reduce the compilation peak memory and improve the compile time for translation units which sparsely use their contents. In addition, that would have a significant impact on interactive C++ where header inclusion essentially becomes a no-op and entities will be only parsed on demand.

The Cling interpreter implements a very naive but efficient cross-translation unit lazy compilation optimization which scales across hundreds of libraries in the field of high-energy physics.

  // A.h
  #include <string>
  #include <vector>
  template <class T, class U = int> struct AStruct {
    void doIt() { /*...*/ }
    const char* data;
    // ...
  };

  template<class T, class U = AStruct<T>>
  inline void freeFunction() { /* ... */ }
  inline void doit(unsigned N = 1) { /* ... */ }

  // Main.cpp
  #include "A.h"
  int main() {
    doit();
    return 0;
  }

This pathological example expands to 37253 lines of code to process. Cling builds an index (it calls it an autoloading map) where it contains only forward declarations of these C++ entities. Their size is 3000 lines of code.

The index looks like:

  // A.h.index
  namespace std{inline namespace __1{template <class _Tp, class _Allocator> class __attribute__((annotate("$clingAutoload$vector")))  __attribute__((annotate("$clingAutoload$A.h")))  __vector_base;
    }}
  ...
  template <class T, class U = int> struct __attribute__((annotate("$clingAutoload$A.h"))) AStruct;

Upon requiring the complete type of an entity, Cling includes the relevant header file to get it. There are several trivial workarounds to deal with default arguments and default template arguments as they now appear on the forward declaration and then the definition. You can read more here.

Although the implementation could not be called a reference implementation, it shows that the Parser and the Preprocessor of Clang are relatively stateless and can be used to process character sequences which are not linear in their nature. In particular namespace-scope definitions are relatively easy to handle and it is not very difficult to return to namespace-scope when we lazily parse something. For other contexts such as local classes we will have lost some essential information such as name lookup tables for local entities. However, these cases are probably not very interesting as the lazy parsing granularity is probably worth doing only for top-level entities.

Such implementation can help with already existing issues in the standard such as CWG2335, under which the delayed portions of classes get parsed immediately when they’re first needed, if that first usage precedes the end of the class. That should give good motivation to upstream all the operations needed to return to an enclosing scope and parse something.

Implementation approach:

Upon seeing a tag definition during parsing we could create a forward declaration, record the token sequence and mark it as a lazy definition. Later upon complete type request, we could re-position the parser to parse the definition body. We already skip some of the template specializations in a similar way [commit, commit].

Another approach is every lazy parsed entity to record its token stream and change the Toks stored on LateParsedDeclarations to optionally refer to a subsequence of the externally-stored token sequence instead of storing its own sequence (or maybe change CachedTokens so it can do that transparently). One of the challenges would be that we currently modify the cached tokens list to append an “eof” token, but it should be possible to handle that in a different way.

In some cases, a class definition can affect its surrounding context in a few ways you’ll need to be careful about here:

1) struct X appearing inside the class can introduce the name X into the enclosing context.

2) static inline declarations can introduce global variables with non-constant initializers that may have arbitrary side-effects.

For point (2), there’s a more general problem: parsing any expression can trigger a template instantiation of a class template that has a static data member with an initializer that has side-effects. Unlike the above two cases, I don’t think there’s any way we can correctly detect and handle such cases by some simple analysis of the token stream; actual semantic analysis is required to detect such cases. But perhaps if they happen only in code that is itself unused, it wouldn’t be terrible for Clang to have a language mode that doesn’t guarantee that such instantiations actually happen.

Alternative and more efficient implementation could be to make the lookup tables range based but we do not have even a prototype proving this could be a feasible approach.

Task ideas and expected results

  • Design and implementation of on-demand compilation for non-templated functions
  • Support non-templated structs and classes
  • Run performance benchmarks on relevant codebases and prepare report
  • Prepare a community RFC document
  • [Stretch goal] Support templates

The successful candidate should commit to regular participation in weekly meetings, deliver presentations, and contribute blog posts as requested. Additionally, they should demonstrate the ability to navigate the community process with patience and understanding.

Improve automatic differentiation of object-oriented paradigms using Clad

Clad is an automatic differentiation (AD) clang plugin for C++. Given a C++ source code of a mathematical function, it can automatically generate C++ code for computing derivatives of the function. Clad has found uses in statistical analysis and uncertainty assessment applications.

Object oriented paradigms (OOP) provide a structured approach for complex use cases, allowing for modular components that can be reused & extended. OOP also allows for abstraction which makes code easier to reason about & maintain. Gaining full OOP support is an open research area for automatic differentiation codes.

This project focuses on improving support for differentiating object-oriented constructs in Clad. This will allow users to seamlessly compute derivatives to the algorithms in their projects which use an object-oriented model. C++ object-oriented constructs include but are not limited to: classes, inheritance, polymorphism, and related features such as operator overloading.

Task ideas and expected results

There are several foreseen tasks:

  • Study the current object-oriented differentiable programming support in Clad. Prepare a report of missing constructs that should be added to support the automatic differentiation of object-oriented paradigms in both the forward mode AD and the reverse mode AD. Some of the missing constructs are: differentiation of constructors, limited support for differentiation of operator overloads, reference class members, and no way of specifying custom derivatives for constructors.
  • Add support for the missing constructs.
  • Add proper tests and documentation.

Broaden the Scope for the Floating-Point Error Estimation Framework in Clad

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library can differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage.

Clad also possesses the capabilities of annotating given source code with floating-point error estimation code. This allows Clad to compute any floating-point related errors in the given function on the fly. This allows Clad to reason about the numerical stability of the given function and also analyze the sensitivity of the variables involved.

The idea behind this project is to develop benchmarks and improve the floating-point error estimation framework as necessary. Moreover, find compelling real-world use-cases of the tool and investigate the possibility of performing lossy compression with it.

On successful completion of the project, the framework should have a sufficiently large set of benchmarks and example usages. Moreover, the framework should be able to run the following code as expected:

#include <iostream>
#include "clad/Differentiator/Differentiator.h"

// Some complicated function made up of doubles.
double someFunc(double F1[], double F2[], double V3[], double COUP1, double COUP2)
{
  double cI = 1;
  double TMP3;
  double TMP4;
  TMP3 = (F1[2] * (F2[4] * (V3[2] + V3[5]) + F2[5] * (V3[3] + cI * (V3[4]))) +
  F1[3] * (F2[4] * (V3[3] - cI * (V3[4])) + F2[5] * (V3[2] - V3[5])));
  TMP4 = (F1[4] * (F2[2] * (V3[2] - V3[5]) - F2[3] * (V3[3] + cI * (V3[4]))) +
  F1[5] * (F2[2] * (-V3[3] + cI * (V3[4])) + F2[3] * (V3[2] + V3[5])));
  return (-1.) * (COUP2 * (+cI * (TMP3) + 2. * cI * (TMP4)) + cI * (TMP3 *
  COUP1));
}

int main() {
  auto df = clad::estimate_error(someFunc);
  // This call should generate a report to decide
  // which variables can be downcast to a float.
  df.execute(args...);
}

Task ideas and expected results

The project consists of the following tasks:

  • Add at least 5 benchmarks and compare the framework’s correctness and performance against them.
  • Compile at least 3 real-world examples that are complex enough to demonstrate the capabilities of the framework.
  • Solve any general-purpose issues that come up with Clad during the process.
  • Prepare demos and carry out development needed for lossy compression.

Improve robustness of dictionary to module lookups in ROOT

The LHC smashes groups of protons together at close to the speed of light: 40 million times per second and with seven times the energy of the most powerful accelerators built up to now. Many of these will just be glancing blows but some will be head on collisions and very energetic. When this happens some of the energy of the collision is turned into mass and previously unobserved, short-lived particles – which could give clues about how Nature behaves at a fundamental level - fly out and into the detector. Our work includes the experimental discovery of the Higgs boson, which leads to the award of a Nobel prize for the underlying theory that predicted the Higgs boson as an important piece of the standard model theory of particle physics.

CMS is a particle detector that is designed to see a wide range of particles and phenomena produced in high-energy collisions in the LHC. Like a cylindrical onion, different layers of detectors measure the different particles, and use this key data to build up a picture of events at the heart of the collision. The CMSSW is a collection of software for the CMS experiment. It is responsible for the collection and processing of information about the particle collisions at the detector. CMSSW uses the ROOT framework to provide support for data storage and processing. ROOT relies on Cling, Clang, LLVM for building automatically efficient I/O representation of the necessary C++ objects. The I/O properties of each object is described in a compileable C++ file called a /dictionary/. ROOT’s I/O dictionary system relies on C++ modules to improve the overall memory footprint when being used.

The few run time failures in the modules integration builds of CMSSW are due to dictionaries that can not be found in the modules system. These dictionaries are present as the mainstream system is able to find them using a broader search. The modules setup in ROOT needs to be extended to include a dictionary extension to track dictionary<->module mappings for C++ entities that introduce synonyms rather than declarations (using std::vector<A<B>> = MyVector where the dictionaries of A, B are elsewhere)

Task ideas and expected results

The project consists of the following tasks:

  • If an alias declaration of kind using std::vector<A<B>> = MyVector, we should store the ODRHash of it in the respective dictionary file as a number attached to a special variable which can be retrieved at symbol scanning time.
  • Track down the test failures of CMSSW and check if the proposed implementation works.
  • Develop tutorials and documentation.
  • Present the work at the relevant meetings and conferences.

Enhance the incremental compilation error recovery in clang and clang-repl

The Clang compiler is part of the LLVM compiler infrastructure and supports various languages such as C, C++, ObjC and ObjC++. The design of LLVM and Clang enables them to be used as libraries, and has led to the creation of an entire compiler-assisted ecosystem of tools. The relatively friendly codebase of Clang and advancements in the JIT infrastructure in LLVM further enable research into different methods for processing C++ by blurring the boundary between compile time and runtime. Challenges include incremental compilation and fitting compile/link time optimizations into a more dynamic environment.

Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.

Our group puts efforts to incorporate and possibly redesign parts of Cling in Clang mainline through a new tool, clang-repl. The project aims at enhancing the error recovery when users type C++ at the prompt of clang-repl.

Task ideas and expected results

There are several tasks to improve the current rudimentary state of the error recovery:

  • Extend the test coverage for error recovery
  • Find and fix cases where there are bugs
  • Implement template instantiation error recovery support
  • Implement argument-dependent lookup (ADL) recovery support

Improve Cling’s Development Lifecycle

Cling is an interactive C++ interpreter, built on top of Clang and LLVM compiler infrastructure. Cling realizes the read-eval-print loop (REPL) concept, in order to leverage rapid application development. Implemented as a small extension to LLVM and Clang, the interpreter reuses their strengths such as the praised concise and expressive compiler diagnostics.

Task ideas and expected results

The project foresees to enhance the Github Actions infrastructure by adding development process automation tools:

  • Code Coverage information (codecov)
  • Static code analysis (clang-tidy)
  • Coding conventions checks (clang-format)
  • Release binary upload automation

Allow redefinition of CUDA functions in Cling

Cling is an interactive C++ interpreter, built on top of Clang and LLVM compiler infrastructure. Cling realizes the read-eval-print loop (REPL) concept, in order to leverage rapid application development. Implemented as a small extension to LLVM and Clang, the interpreter reuses their strengths such as the praised concise and expressive compiler diagnostics.

Since the development of Cling started, it got some new features to enable new workflows. One of the features is CUDA mode, which allows you to interactively develop and run CUDA code on Nvidia GPUs. Another feature is the redefinition of functions, variable classes and more, bypassing the one-definition rule of C++. This feature enables comfortable rapid prototyping in C++. Currently, the two features cannot be used together because parsing and executing CUDA code behaves differently compared to pure C++.

Task ideas and expected results

The task is to adapt the redefinitions feature of the pure C++ mode for the CUDA mode. To do this, the student must develop solutions to known and unknown problems that parsing and executing CUDA code causes.

Developing C++ modules support in CMSSW and Boost

The LHC smashes groups of protons together at close to the speed of light: 40 million times per second and with seven times the energy of the most powerful accelerators built up to now. Many of these will just be glancing blows but some will be head on collisions and very energetic. When this happens some of the energy of the collision is turned into mass and previously unobserved, short-lived particles – which could give clues about how Nature behaves at a fundamental level - fly out and into the detector. Our work includes the experimental discovery of the Higgs boson, which leads to the award of a Nobel prize for the underlying theory that predicted the Higgs boson as an important piece of the standard model theory of particle physics.

CMS is a particle detector that is designed to see a wide range of particles and phenomena produced in high-energy collisions in the LHC. Like a cylindrical onion, different layers of detectors measure the different particles, and use this key data to build up a picture of events at the heart of the collision.

Last year, thanks to Lucas Calmolezi and GSoC, the usage of boost in CMSSW was modernized. It improved the C++ modules support of local boost fork.

Task ideas and expected results

Many of the accumulated local patches add missing includes to the relevant boost header files. The candidate should start by proposing the existing patches to the boost community. Try to compile more boost-specific modules which is mostly a mechanical task. The student should be ready to work towards making the C++ module files more efficient containing less duplications. The student should be prepared to write a progress report and present the results.

Modernize the LLVM “Building A JIT” tutorial series

The LLVM BuildingAJIT tutorial series teaches readers to build their own JIT class from scratch using LLVM’s ORC APIs, however the tutorial chapters have not kept pace with recent API improvements. Bring the existing tutorial chapters up to speed, write up a new chapter on lazy compilation (chapter code already available) or write a new chapter from scratch.

Task ideas and expected results

  • Update chapter text for Chapters 1-3 – Easy, but offers a chance to get up-to-speed on the APIs.
  • Write chapter text for Chapter 4 – Chapter code is already available, but no chapter text exists yet.
  • Write a new chapter from scratch – E.g. How to write an out-of-process JIT, or how to directly manipulate the JIT’d instruction stream using the ObjectLinkingLayer::Plugin API.