Group Members

David J Lange

Research Staff
email: David.Lange@princeton.edu

Education: PhD Phyiscs, University of California, Santa Barbara (1999).

Vassil Vassilev

Research Staff
email: vvasilev@cern.ch

Education: PhD Computer Science, University of Plovdiv “Paisii Hilendarski”, Plovdiv, Bulgaria (2015).

Ioana Ifrim

Research Staff
email: ioana.ifrim@cern.ch

Education: MPhil Advanced Computer Science, University of Cambridge (2018)

Sara Bellei

Google Season of Docs 2022
email: sara.bellei.87@gmail.com

Education: PhD in Physics, Politecnico University of Milan, Italy (2017)

Ongoing project: Improving the Clang-REPL documentation
Clang-REPL is the evolution of Cling, an interactive c++ interpreter based on LLVM and Clang. It first developed as part of the high-energy physics (HEP) data analysis project - ROOT, and subsequently grew as a standalone tool outisde the HEP community. The main goal behind the Clang-REPL project is to move most parts of Cling into LLVM. By doing so, the benefits of using the LLVM community standards for code reviews, release cycles and integration will ensure the software’s sustainability, and will enable it to reach a wider audience. My goal is to establish a protocol for the Clang-REPL’s documentation that will be easy to read from user’s perspective, and easy to update as the codes continue to evolve.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, David Lange

Jun Zhang

Research Intern, Google Summer of Code 2022 former contributor
email: jun@junz.org

Education: Software Engineering, Anhui Normal University, WuHu, China

Ongoing project: Implement value printing in clang-repl
clang-repl is the upstream version of Cling Interpreter, which only implements a subset of features in Cling. In this proposal, we try to bring value printing, a very useful feature that enables users to know the detailed information of the expressions that users have inputted

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, David Lange

Completed project: Optimize ROOT use of modules for large codebases
ROOT is a data analysis framework designed to handle large amounts of data with high performance. This proposal aims at optimizing the performance of ROOT by reducing unnecessary symbol lookup across the very large set of C++ modules.

Project Proposal: URL

Project Reports: Final Report|Blog post

Mentors: Vassil Vassilev, David Lange, Alexander Penev

Anubhab Ghosh

Research Intern, Google Summer of Code 2022 former contributor
email: anubhabghosh.me@gmail.com

Education: Computer Science and Engineering, Indian Institute of Information Technology, Kalyani, India

Ongoing project: Design and Develop a CUDA engine for clang-repl
CUDA is a GPGPU platform and API targeted towards NVIDIA GPUs that gives access to compute elements of the GPU through standard programming languages like C++. The goal of the project is to implement CUDA support for clang-repl that will be useful for interpreting CUDA C++ code. This would possibly require clang-repl to distinguish between host and device code and separately compile device code to PTX.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, David Lange

Completed project: Shared Memory Based JITLink Memory Manager
When a separate executor process is used with LLVM JIT, the generated code needs to be transferred to the executor process which is done by the JITLinkMemoryManager. The current implementation uses ExecutorProcessControl API (an RPC scheme) to send the generated code which goes through pipes or network sockets. The goal of the project is to transfer it through an operating system provided shared memory regions for better performance, when both the JIT process and the executor process are sharing the same underlying physical memory. It should be done by allocating large chunks of memory and distributing it to reduce memory allocation and inter process communication overheads.

Project Proposal: URL

Project Reports: Final Report|Blog post

Mentors: Vassil Vassilev, Stefan Gränitz, Lang Hames

Garima Singh

Research Intern at CERN
email: garimasingh0028@gmail.com

Education: B. Tech in Information Technology, Manipal Institute of Technology, Manipal, India

Ongoing project: Add Numerical Differentiation Support in Clad
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is based on Clang which provides the necessary facilities for code transformation. The AD library can differentiate non-trivial functions, find a partial derivative for trivial cases, and has good unit test coverage. In several cases, due to different limitations, it is either inefficient or impossible to differentiate a function. For example, clad cannot differentiate declared-but-not-defined functions. In that case, it issues an error. Instead, clad should fall back to its future numerical differentiation facilities.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, Alexander Penev

Completed project: Floating point error evaluation with Clad
Floating-point estimation errors have been a testament to the finite nature of computing. Moreover, the predominance of Floating-point numbers in real-valued computation does not help that fact. Float computations are highly dependent on precision, and in most cases, very high precision calculation is not only not possible but very inefficient. Here, one has no choice but to resort to lower precision computing, which in turn is quite prone to errors. These errors result in inaccurate and sometimes catastrophic results; hence, it is imperative to estimate these errors accurately. This project aims to use Clad, a source transformation AD tool for C++ implemented as a plugin for the C++ compiler Clang, to develop a generic error estimation framework that is not bound to a particular error approximation model. It will allow users to select their preferable estimation logic and automatically generate functions augmented with code for the specified error estimator.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, David Lange

Baidyanath Kundu

Research Intern at CERN
email: kundubaidya99@gmail.com

Education: B. Tech in Computer Science and Engg., Manipal Institute of Technology, Manipal, India

Ongoing project: Improving Cling Reflection for Scripting Languages
Cling has basic facilities to make queries about the C++ code that it has seen/collected so far. These lookups assume, however, that the caller knows what it is looking for and the information returned, although exact, usually only makes sense within C++ and is thus often too specific to be used as-is. A scripting language, such as Python, that wants to make use of such lookups by name, is forced to loop over all possible entities (classes, functions, templates, enums, …) to find a match. This is inefficient. Furthermore, many lookups will be multi-stage: a function, but which overload? A template, but which instantiation? A typedef, of what? The current mechanism forces the scripting language to provide a type-based match, even where C++ makes distinctions (e.g. pointer v.s. reference) that do not exist in the scripting language. This, too, makes lookups very inefficient. The returned information, once a match is found, is exact, but because of its specificity, requires the caller to figure out C++ concepts that have no meaning in the scripting language. E.g., there is no reason for Python to consider an implicitly instantiated function template different from an explicitly instantiated one.

Project Proposal: URL

Project Reports:

Mentors: Wim Lavrijsen, Vassil Vassilev

Completed project: Utilize second order derivatives from Clad in ROOT
ROOT is a framework for data processing, born at CERN, at the heart of the research on high-energy physics. ROOT has a clang-based C++ interpreter Cling and integrates with the automatic differentiation plugin Clad to enable flexible automatic differentiation facility. TFormula is a ROOT class which bridges compiled and interpreted code. This project aims to add second order derivative support in TFormula using clad::hessian. The PR that added support for gradients in ROOT is taken as a reference and can be accessed here.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, Ioana Ifrim

Purva Chaudhari

Research Intern
email: purva.chaudhari02@gmail.com

Education: Computer Science, Vishwakarma Institute of Technology

Ongoing project: Enhance the incremental compilation error recovery in clang and clang-repl
The Clang compiler is part of the LLVM compiler infrastructure and supports various languages such as C, C++, ObjC and ObjC++. Advancements in JIT infstructure and usability of Clang libriries in LLVM has enabled research into processing C++. It has been challenging to include incremental compilation and fitting compile time optimizations into a more dynamic environment. Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment. Clang-repl is a new-tool incroporated in LLVM ecosystem by residesigning parts of Cling in Clang mainline. The project aims at enhancing the error recovery when users type C++ at the prompt of clang-repl.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, David Lange

Completed project: Reduce boost dependencies in CMSSW
This project has the objective to reduce CMSSW technical debt by finding and replacing boost dependencies that have an equivalent solution in standard C++. Reducing boost dependencies helps us create more lightweight boost clang modules for upcoming c++20. This also reduces the amount of headers that we need to work on to be able to use c++20 clang modules.

Project Proposal: URL

Project Reports: URL

Mentors: Vassil Vassilev, David Lange

This could be you!

See openings for more info
email: vvasilev@cern.ch


Alumni

Surya Somayyajula

IRIS-HEP Fellow
email: somayyajula@wisc.edu

Education: Computer Sciences B.S., University of Wisconsin-Madison

Ongoing project: Improve Cling’s packaging system: Cling Packaging Tool
Cling is an interactive C++ interpreter/compiler that utilizes the REPL (read-evaluate-print-loop) paradigm for fast development and testing as well as immediate feedback and runtime-generated code. One of the many useful tools included in the Cling interpreter is the Cling Packaging Tool (CPT), which is a command line utility that can easily build Cling from source and generate installer bundles for a variety of platforms, including Ubuntu and Debian-based platforms, Windows, distributions based on Red Hat Linux, Mac OS X, and any Unix-like platform. While the CPT is an incredibly useful and flexible tool, there are several improvements that can be made to make the user’s experience with the CPT even more seamless.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, David Lange

Sunho Kim

Google Summer of Code 2022
email: ksunhokim123@gmail.com

Education: Computer Science, De Anza College, Cupertino, California

Completed project: Write JITLink support for a new format/architecture (ELF/AARCH64)
JITLink is LLVM’s new JIT linker API – the low-level API that transforms compiler output (relocatable object files) into ready-to-execute bytes in memory. With its new architecture, it is able to support a variety of new features, which includes static initializer, thread local storage, and small code model, that were not possible in RuntimeDyld, the old JIT API. JITLink’s generic linker algorithm needs to be specialized to support the target object format (COFF, ELF, MachO), and architecture (arm, arm64, i386, x86-64). This project aims to implement the JITLink specialization for ELF/aarch64 which is required to use JITLink in arm64 linux.

Project Proposal: URL

Project Reports: Final Report

Mentors: Vassil Vassilev, Stefan Gränitz, Lang Hames

Rohit Singh Rathaur

Google Season of Docs 2022
email: rohitrathore.imh55@gmail.com

Education: Mathematics & Computing, Birla Institute of Technology, Mesra, India

Ongoing project: Improving Interactive Tool Analysis Documentation for the HSF
HEP researchers have developed several unique software technologies in the area of data analysis. Over the last decade we developed an interactive, interpretative C++ interpreter (aka REPL) as part of the ROOT data analysis project. We invested a significant effort to replace CINT, the C++ interpreter used until ROOT5, with a newly implemented REPL based on LLVM Cling. Cling is a core component of ROOT and has been in production since 2014. Cling is also a standalone tool, which has a growing community outside of our field. It is recognized for enabling interactivity, dynamic interoperability and rapid prototyping capabilities for C++ developers. For example, if you are typing C++ in a Jupyter notebook you are using the xeus-cling Jupyter kernel. So we are in the midst of an important project to address one of the major challenges to ensure Clings sustainability and to foster that growing community: moving most parts of Cling into LLVM. Since LLVM version 13 we have a version of Cling called Clang-Repl. As we advance the implementation and generalize its usage here we aim for improving the overall documentation experience in the area of interactive C++.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, David Lange

Manish Kausik H

Google Summer of Code 2022
email: hmanishkausik@gmail.com

Education: B.Tech and M.Tech in Computer Science and Engineering(Dual Degree), Indian Institute of Technology Bhubaneswar

Completed project: Add Initial Integration of Clad with Enzyme
Clad is an open source plugin to the Clang compiler that detects from the parsed Abstract syntax tree, calls to differentiate a defined function, generates code that differentiates the function using the concept of Automatic Differentiation(AD) and modifies the Abstract Syntax Tree(AST) to insert the generated code. While clad works in the frontend of the compilation process, Enzyme, another LLVM based AD plugin works in the backend, where it takes in code in LLVM IR form and then differentiates the code. This proposal aims to integrate Clad with Enzyme, and give the user the option of selecting Enzyme for Automatic Differentiation, based on his/her needs. This will give the user the same User Interface as clad for writing his/her code, but the option of using Enzyme as the backend with all its optimisations to calculate the Derivative/Gradient of the requested function. My proposal also briefly gives insights into how this can be achieved by tapping into the existing code base of Clad.

Project Proposal: URL

Project Reports: Final Report|Blog post

Mentors: Vassil Vassilev, David Lange

Tapasweni Pathak

Principal Product Manager at Microsoft Azure Core Engineering
email: tapaswenipathak@gmail.com

Education: B.Tech in Computer Science, Indira Gandhi Delhi Technical University for Women, 2014

Failed project: Improving performance of C++ modules in Clang
The C++ modules technology aims to provide a scalable compilation model for the C++ language. The C++ Modules technology in Clang provides an io-efficient, on-disk representation capable to reduce build times and peak memory usage. The internal compiler state such as the abstract syntax tree (AST) is stored on disk and lazily loaded on demand. C++ Modules improve the memory footprint for interpreted C++ through the Cling C++ interpreter developed by CERN and the compiler research group at Princeton. The current implementation is pretty good at making most operations on demand. However in a few cases, we eagerly load pieces of the AST, for example at module import time and upon selecting a suitable template specialization. When selecting the template specialization we load all template specializations from the module files just to find out they are not suitable. There is a patch that partially solves this issue by introducing a template argument hash and use it to look up the candidates without deserializing them. However, the data structure it uses to store the hashes leads to quadratic search which is inefficient when the number of modules becomes sufficiently large.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev

Parth Arora

IRIS-HEP Fellow
email: partharora99160808@gmail.com

Education: B.Tech in Computer Science, USICT, Guru Gobind Singh Indraprastha University, New Delhi, India

Ongoing project: Add support for custom types in Clad with a focus on the Softsusy library
User-defined types in C++ helps to make code more readable and maintainable. Many user-defined programs and almost every major library uses user-defined types. Thus it is very crucial for clad to support differentiating user-defined types. The first goal of the project is to add support for differentiating user-defined types in clad. Clad currently also does not support many C++ syntaxes. Many of these are essential and are very well used in day-to-day programming such as break and continue statements. The second goal of the project is to battle test clad on Eigen and softsusy library codebases to find and add support for most of the missing syntax as well as to improve support for differentiating function calls.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, David Lange

Completed project: Add support for differentiating functor objects in clad
Differentiation support for functions is available in clad. But support for direct differentiation of functors and lambda expressions is missing. Many computations are modelled using functors and functors and lambda expressions are becoming more and more relevant in modern C++. This project aims to add support for directly differentiating functors and lambda expressions in clad.

Project Proposal: URL

Project Reports:

Mentors: Vassil Vassilev, David Lange

Matheus Izvekov

Google Summer of Code Contributor 2022
email: mizvekov@gmail.com

Education: Computer Science

Complete project: Preserve type sugar for member access on template specializations
In C++, it’s often useful to write wrappers that abstract or extend some underlying type passed as a template argument. But templates are only instantated taking into account the ‘fundamental’ types of the arguments, discarding ‘type sugar’, such as any aliases, attributes or other cosmetic metadata such as how the name of the type was qualified and such. While this ends up in practice being brittle to rely on, attributes on the type itself or a typedef thereof can have many interesting non-cosmetic effects, like changing data alignment, calling conventions, and other custom / domain specific functionality. We refer to such ‘fundamental’ types as ‘canonical’ types here. Without any further engineering to work around this limitation, member accesses on template specializations will only reflect these canonical types, with the simplest example being the loss of any sugar on the argument when acessing a member alias to the argument itself. For this project, we will improve Clang’s type system so that any type sugar on the arguments of a template specialization are pushed into those member accesses.

Project Proposal: URL

Project Reports: Final Report|Blog post

Mentors: Vassil Vassilev, Richard Smith

Ajay Uppili Arasanipalai

Google Summer of Code Student 2021
email: aua2@illinois.edu

Education: University of Illinois at Urbana-Champaign, Grainger College of Engineering

project: Modernize the LLVM “Building A JIT” Tutorial Series
The LLVM JIT API has changed many times over the years. However, the official tutorials have failed to keep up. This project aims to update the official “Building a JIT” tutorials to use the latest version of the OrcJIT API and add new content that might be relevant to new LLVM users interested in writing their own JIT compilers.

Project Proposal: URL

Project Reports:

Mentors: Lang Hames, Vassil Vassilev

Vaibhav Garg

Google Summer of Code Student 2020
email: gargvaibhav64@gmail.com

Education: Computer Science, Birla Institute of Technology and Science, Pilani, India

Completed project: Enable Modules on Windows
ROOT has several features that interact with libraries and require implicit header inclusion. This can be triggered by reading or writing data on disk, or user actions at the prompt. Exposing the full shared library descriptors to the interpreter at runtime translates into an increased memory footprint. ROOT’s exploratory programming concepts allow implicit and explicit runtime shared library loading. It requires the interpreter to load the library descriptor. Re-parsing of descriptors’ content has a noticeable effect on runtime performance. C++ Modules are designed to minimize the reparsing of the same header content by providing an efficient on-disk representation of the C++ Code. C++ Modules have been implemented for Unix and OS X systems already and it is expected that with next release of ROOT, C++ modules will be default on OS X. This project aims to extend the C++ Modules support for Windows, by implementing compatible solutions to the UNIX baseline and also display corresponding performance results.

Project Proposal: URL

Project Reports: GSoC 2020 Archive

Mentors: Vassil Vassilev, Bertrand Bellenot

Lucas Camolezi

Google Summer of Code Student 2020
email: camolezi@usp.br

Education: Computer Engineering, University of São Paulo, Brazil

Completed project: Reduce boost dependence in CMSSW
This project has the goal to find and decrease boost dependencies in CMSSW. Modern C++ introduced a lot of new features that were only available through boost packages. Thus, some boost code can be replaced with similar C++ standard library features. Using standard features is a good practice, this project will move the CMSSW codebase in that direction.

Project Proposal: URL

Project Reports: GSoC 2020 Archive

Mentors: Vassil Vassilev, David Lange

Roman Shakhov

Google Summer of Code Student 2020
email: r.intval@gmail.com

Education: Mathematics and Computer Science, Voronezh State University, Russia

project: Extend clad to compute Jacobians
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). CLAD is based on Clang which will provide the necessary facilities for code transformation. The AD library is able to differentiate non-trivial functions, to find a partial derivative for trivial cases and has good unit test coverage. Currently, clad does not provide an easy way to compute Jacobians.

Project Proposal: URL

Project Reports: Poster

Mentors: Vassil Vassilev, Alexander Penev