Research Staff
email: David.Lange@princeton.edu
Education: PhD Phyiscs, University of California, Santa Barbara (1999).
Research Staff
email: vvasilev@cern.ch
Education: PhD Computer Science, University of Plovdiv “Paisii Hilendarski”, Plovdiv, Bulgaria (2015).
Research Staff
email: ioana.ifrim@cern.ch
Education: MPhil Advanced Computer Science, University of Cambridge (2018)
Google Season of Docs 2022
email: sara.bellei.87@gmail.com
Education: PhD in Physics, Politecnico University of Milan, Italy (2017)
Ongoing project:
Improving the Clang-REPL documentation
Clang-REPL is the evolution of Cling, an interactive c++ interpreter based on LLVM and
Clang. It first developed as part of the high-energy physics (HEP) data analysis project -
ROOT, and subsequently grew as a standalone tool outisde the HEP community. The main goal
behind the Clang-REPL project is to move most parts of Cling into LLVM. By doing so, the
benefits of using the LLVM community standards for code reviews, release cycles and
integration will ensure the software’s sustainability, and will enable it to reach a wider
audience. My goal is to establish a protocol for the Clang-REPL’s documentation that will
be easy to read from user’s perspective, and easy to update as the codes continue to evolve.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, David Lange
Research Intern, Google Summer of Code 2022 former contributor
email: jun@junz.org
Education: Software Engineering, Anhui Normal University, WuHu, China
Ongoing project:
Implement value printing in clang-repl
clang-repl is the upstream version of Cling Interpreter, which only implements a subset of features
in Cling. In this proposal, we try to bring value printing, a very useful feature that enables users
to know the detailed information of the expressions that users have inputted
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, David Lange
Completed project:
Optimize ROOT use of modules for large codebases
ROOT is a data analysis framework designed to handle large amounts of
data with high performance. This proposal aims at optimizing the
performance of ROOT by reducing unnecessary symbol lookup across the
very large set of C++ modules.
Project Proposal: URL
Project Reports: Final Report|Blog post
Mentors: Vassil Vassilev, David Lange, Alexander Penev
Research Intern, Google Summer of Code 2022 former contributor
email: anubhabghosh.me@gmail.com
Education: Computer Science and Engineering, Indian Institute of Information Technology, Kalyani, India
Ongoing project:
Design and Develop a CUDA engine for clang-repl
CUDA is a GPGPU platform and API targeted towards NVIDIA GPUs that gives access to compute
elements of the GPU through standard programming languages like C++. The goal of the project
is to implement CUDA support for clang-repl that will be useful for interpreting CUDA C++
code. This would possibly require clang-repl to distinguish between host and device code
and separately compile device code to PTX.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, David Lange
Completed project:
Shared Memory Based JITLink Memory Manager
When a separate executor process is used with LLVM JIT, the generated code needs to be
transferred to the executor process which is done by the JITLinkMemoryManager. The current
implementation uses ExecutorProcessControl API (an RPC scheme) to send the generated code
which goes through pipes or network sockets. The goal of the project is to transfer it
through an operating system provided shared memory regions for better performance, when
both the JIT process and the executor process are sharing the same underlying physical
memory. It should be done by allocating large chunks of memory and distributing it to
reduce memory allocation and inter process communication overheads.
Project Proposal: URL
Project Reports: Final Report|Blog post
Mentors: Vassil Vassilev, Stefan Gränitz, Lang Hames
Research Intern at CERN
email: garimasingh0028@gmail.com
Education: B. Tech in Information Technology, Manipal Institute of Technology, Manipal, India
Ongoing project:
Add Numerical Differentiation Support in Clad
In mathematics and computer algebra, automatic differentiation (AD) is a set
of techniques to numerically evaluate the derivative of a function specified
by a computer program. Automatic differentiation is an alternative technique
to Symbolic differentiation and Numerical differentiation (the method of finite
differences). Clad is based on Clang which provides the necessary facilities
for code transformation. The AD library can differentiate non-trivial functions,
find a partial derivative for trivial cases, and has good unit test coverage.
In several cases, due to different limitations, it is either inefficient or
impossible to differentiate a function. For example, clad cannot differentiate
declared-but-not-defined functions. In that case, it issues an error. Instead,
clad should fall back to its future numerical differentiation facilities.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, Alexander Penev
Completed project:
Floating point error evaluation with Clad
Floating-point estimation errors have been a testament to the finite nature
of computing. Moreover, the predominance of Floating-point numbers in
real-valued computation does not help that fact. Float computations are
highly dependent on precision, and in most cases, very high precision
calculation is not only not possible but very inefficient. Here, one has no
choice but to resort to lower precision computing, which in turn is quite
prone to errors. These errors result in inaccurate and sometimes
catastrophic results; hence, it is imperative to estimate these errors
accurately. This project aims to use Clad, a source transformation AD tool
for C++ implemented as a plugin for the C++ compiler Clang, to develop a
generic error estimation framework that is not bound to a particular error
approximation model. It will allow users to select their preferable
estimation logic and automatically generate functions augmented with code
for the specified error estimator.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, David Lange
Research Intern at CERN
email: kundubaidya99@gmail.com
Education: B. Tech in Computer Science and Engg., Manipal Institute of Technology, Manipal, India
Ongoing project:
Improving Cling Reflection for Scripting Languages
Cling has basic facilities to make queries about the C++ code that it has seen/collected so far.
These lookups assume, however, that the caller knows what it is looking for and the information
returned, although exact, usually only makes sense within C++ and is thus often too specific to
be used as-is. A scripting language, such as Python, that wants to make use of such lookups by
name, is forced to loop over all possible entities (classes, functions, templates, enums, …)
to find a match. This is inefficient. Furthermore, many lookups will be multi-stage: a function,
but which overload? A template, but which instantiation? A typedef, of what? The current
mechanism forces the scripting language to provide a type-based match, even where C++ makes
distinctions (e.g. pointer v.s. reference) that do not exist in the scripting language. This,
too, makes lookups very inefficient. The returned information, once a match is found, is exact,
but because of its specificity, requires the caller to figure out C++ concepts that have no
meaning in the scripting language. E.g., there is no reason for Python to consider an implicitly
instantiated function template different from an explicitly instantiated one.
Project Proposal: URL
Project Reports:
Mentors: Wim Lavrijsen, Vassil Vassilev
Completed project:
Utilize second order derivatives from Clad in ROOT
ROOT is a framework for data processing, born at CERN, at the heart of the research on high-energy physics.
ROOT has a clang-based C++ interpreter Cling and integrates with the automatic differentiation plugin Clad
to enable flexible automatic differentiation facility. TFormula is a ROOT class which bridges compiled and
interpreted code. This project aims to add second order derivative support in TFormula using clad::hessian.
The PR that added support for gradients in ROOT is taken as a reference and can be accessed here.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, Ioana Ifrim
Research Intern
email: purva.chaudhari02@gmail.com
Education: Computer Science, Vishwakarma Institute of Technology
Ongoing project:
Enhance the incremental compilation error recovery in clang and clang-repl
The Clang compiler is part of the LLVM compiler infrastructure and supports various languages such as C,
C++, ObjC and ObjC++. Advancements in JIT infstructure and usability of Clang libriries in LLVM has
enabled research into processing C++. It has been challenging to include incremental compilation and
fitting compile time optimizations into a more dynamic environment.
Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit.
Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. The incremental compilation mode
is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics
analysis in a C++ environment. Clang-repl is a new-tool incroporated in LLVM ecosystem by residesigning parts
of Cling in Clang mainline. The project aims at enhancing the error recovery when users type C++ at the prompt of clang-repl.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, David Lange
Completed project:
Reduce boost dependencies in CMSSW
This project has the objective to reduce CMSSW technical debt by finding
and replacing boost dependencies that have an equivalent solution in
standard C++. Reducing boost dependencies helps us create more lightweight
boost clang modules for upcoming c++20. This also reduces the amount of
headers that we need to work on to be able to use c++20 clang modules.
Project Proposal: URL
Project Reports: URL
Mentors: Vassil Vassilev, David Lange
IRIS-HEP Fellow
email: somayyajula@wisc.edu
Education: Computer Sciences B.S., University of Wisconsin-Madison
Ongoing project:
Improve Cling’s packaging system: Cling Packaging Tool
Cling is an interactive C++ interpreter/compiler that utilizes the REPL
(read-evaluate-print-loop) paradigm for fast development and testing as
well as immediate feedback and runtime-generated code. One of the many
useful tools included in the Cling interpreter is the Cling Packaging
Tool (CPT), which is a command line utility that can easily build Cling
from source and generate installer bundles for a variety of platforms,
including Ubuntu and Debian-based platforms, Windows, distributions
based on Red Hat Linux, Mac OS X, and any Unix-like platform. While the
CPT is an incredibly useful and flexible tool, there are several
improvements that can be made to make the user’s experience with the CPT
even more seamless.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, David Lange
Google Summer of Code 2022
email: ksunhokim123@gmail.com
Education: Computer Science, De Anza College, Cupertino, California
Completed project:
Write JITLink support for a new format/architecture (ELF/AARCH64)
JITLink is LLVM’s new JIT linker API – the low-level API that transforms compiler output
(relocatable object files) into ready-to-execute bytes in memory. With its new architecture,
it is able to support a variety of new features, which includes static initializer, thread
local storage, and small code model, that were not possible in RuntimeDyld, the old JIT API.
JITLink’s generic linker algorithm needs to be specialized to support the target object format
(COFF, ELF, MachO), and architecture (arm, arm64, i386, x86-64). This project aims to implement
the JITLink specialization for ELF/aarch64 which is required to use JITLink in arm64 linux.
Project Proposal: URL
Project Reports: Final Report
Mentors: Vassil Vassilev, Stefan Gränitz, Lang Hames
Google Season of Docs 2022
email: rohitrathore.imh55@gmail.com
Education: Mathematics & Computing, Birla Institute of Technology, Mesra, India
Ongoing project:
Improving Interactive Tool Analysis Documentation for the HSF
HEP researchers have developed several unique software technologies in the area of
data analysis. Over the last decade we developed an interactive, interpretative C++
interpreter (aka REPL) as part of the ROOT data analysis project. We invested a
significant effort to replace CINT, the C++ interpreter used until ROOT5, with a
newly implemented REPL based on LLVM Cling. Cling is a core component of ROOT and
has been in production since 2014. Cling is also a standalone tool, which has a
growing community outside of our field. It is recognized for enabling interactivity,
dynamic interoperability and rapid prototyping capabilities for C++ developers. For
example, if you are typing C++ in a Jupyter notebook you are using the xeus-cling
Jupyter kernel. So we are in the midst of an important project to address one of the major
challenges to ensure Clings sustainability and to foster that growing community: moving
most parts of Cling into LLVM. Since LLVM version 13 we have a version of Cling called
Clang-Repl. As we advance the implementation and generalize its usage here we aim for
improving the overall documentation experience in the area of interactive C++.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, David Lange
Google Summer of Code 2022
email: hmanishkausik@gmail.com
Education: B.Tech and M.Tech in Computer Science and Engineering(Dual Degree), Indian Institute of Technology Bhubaneswar
Completed project:
Add Initial Integration of Clad with Enzyme
Clad is an open source plugin to the Clang compiler that detects from the parsed Abstract
syntax tree, calls to differentiate a defined function, generates code that differentiates the
function using the concept of Automatic Differentiation(AD) and modifies the Abstract Syntax
Tree(AST) to insert the generated code. While clad works in the frontend of the compilation
process, Enzyme, another LLVM based AD plugin works in the backend, where it takes in code
in LLVM IR form and then differentiates the code. This proposal aims to integrate Clad with Enzyme,
and give the user the option of selecting Enzyme for Automatic Differentiation, based on his/her needs. This will give the user
the same User Interface as clad for writing his/her code, but the option of using Enzyme as the
backend with all its optimisations to calculate the Derivative/Gradient of the requested function.
My proposal also briefly gives insights into how this can be achieved by tapping into the
existing code base of Clad.
Project Proposal: URL
Project Reports: Final Report|Blog post
Mentors: Vassil Vassilev, David Lange
Principal Product Manager at Microsoft Azure Core Engineering
email: tapaswenipathak@gmail.com
Education: B.Tech in Computer Science, Indira Gandhi Delhi Technical University for Women, 2014
Failed project:
Improving performance of C++ modules in Clang
The C++ modules technology aims to provide a scalable compilation model for the
C++ language. The C++ Modules technology in Clang provides an io-efficient,
on-disk representation capable to reduce build times and peak memory usage. The
internal compiler state such as the abstract syntax tree (AST) is stored on disk
and lazily loaded on demand. C++ Modules improve the memory footprint for
interpreted C++ through the Cling C++ interpreter developed by CERN and the
compiler research group at Princeton. The current implementation is pretty good
at making most operations on demand. However in a few cases, we eagerly load
pieces of the AST, for example at module import time and upon selecting a
suitable template specialization. When selecting the template specialization
we load all template specializations from the module files just to find out they
are not suitable. There is a patch that partially solves this issue by
introducing a template argument hash and use it to look up the candidates
without deserializing them. However, the data structure it uses to store the
hashes leads to quadratic search which is inefficient when the number of modules
becomes sufficiently large.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev
IRIS-HEP Fellow
email: partharora99160808@gmail.com
Education: B.Tech in Computer Science, USICT, Guru Gobind Singh Indraprastha University, New Delhi, India
Ongoing project:
Add support for custom types in Clad with a focus on the Softsusy library
User-defined types in C++ helps to make code more readable and maintainable.
Many user-defined programs and almost every major library uses user-defined
types. Thus it is very crucial for clad to support differentiating user-defined
types. The first goal of the project is to add support for differentiating
user-defined types in clad. Clad currently also does not support many C++
syntaxes. Many of these are essential and are very well used in day-to-day
programming such as break and continue statements. The second goal of the
project is to battle test clad on Eigen and softsusy library codebases to
find and add support for most of the missing syntax as well as to improve
support for differentiating function calls.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, David Lange
Completed project:
Add support for differentiating functor objects in clad
Differentiation support for functions is available in clad. But support for direct
differentiation of functors and lambda expressions is missing. Many computations are
modelled using functors and functors and lambda expressions are becoming more and
more relevant in modern C++. This project aims to add support for directly differentiating
functors and lambda expressions in clad.
Project Proposal: URL
Project Reports:
Mentors: Vassil Vassilev, David Lange
Google Summer of Code Contributor 2022
email: mizvekov@gmail.com
Education: Computer Science
Complete project:
Preserve type sugar for member access on template specializations
In C++, it’s often useful to write wrappers that abstract or extend some
underlying type passed as a template argument. But templates are only
instantated taking into account the ‘fundamental’ types of the
arguments, discarding ‘type sugar’, such as any aliases, attributes or
other cosmetic metadata such as how the name of the type was qualified
and such. While this ends up in practice being brittle to rely on,
attributes on the type itself or a typedef thereof can have many
interesting non-cosmetic effects, like changing data alignment,
calling conventions, and other custom / domain specific functionality.
We refer to such ‘fundamental’ types as ‘canonical’ types here.
Without any further engineering to work around this limitation,
member accesses on template specializations will only reflect these
canonical types, with the simplest example being the loss of any sugar
on the argument when acessing a member alias to the argument itself.
For this project, we will improve Clang’s type system so that any
type sugar on the arguments of a template specialization are pushed into
those member accesses.
Project Proposal: URL
Project Reports: Final Report|Blog post
Mentors: Vassil Vassilev, Richard Smith
Google Summer of Code Student 2021
email: aua2@illinois.edu
Education: University of Illinois at Urbana-Champaign, Grainger College of Engineering
project:
Modernize the LLVM “Building A JIT” Tutorial Series
The LLVM JIT API has changed many times over the years.
However, the official tutorials have failed to keep up.
This project aims to update the official “Building a JIT” tutorials to use
the latest version of the OrcJIT API and add new content that might be relevant
to new LLVM users interested in writing their own JIT compilers.
Project Proposal: URL
Project Reports:
Mentors: Lang Hames, Vassil Vassilev
Google Summer of Code Student 2020
email: gargvaibhav64@gmail.com
Education: Computer Science, Birla Institute of Technology and Science, Pilani, India
Completed project:
Enable Modules on Windows
ROOT has several features that interact with libraries and require implicit
header inclusion. This can be triggered by reading or writing data on disk,
or user actions at the prompt. Exposing the full shared library descriptors
to the interpreter at runtime translates into an increased memory footprint.
ROOT’s exploratory programming concepts allow implicit and explicit runtime
shared library loading. It requires the interpreter to load the library
descriptor. Re-parsing of descriptors’ content has a noticeable effect on
runtime performance. C++ Modules are designed to minimize the reparsing of
the same header content by providing an efficient on-disk representation of
the C++ Code. C++ Modules have been implemented for Unix and OS X systems
already and it is expected that with next release of ROOT, C++ modules will
be default on OS X. This project aims to extend the C++ Modules support for
Windows, by implementing compatible solutions to the UNIX baseline and also
display corresponding performance results.
Project Proposal: URL
Project Reports: GSoC 2020 Archive
Mentors: Vassil Vassilev, Bertrand Bellenot
Google Summer of Code Student 2020
email: camolezi@usp.br
Education: Computer Engineering, University of São Paulo, Brazil
Completed project:
Reduce boost dependence in CMSSW
This project has the goal to find and decrease boost dependencies in CMSSW.
Modern C++ introduced a lot of new features that were only available through
boost packages. Thus, some boost code can be replaced with similar C++
standard library features. Using standard features is a good practice, this
project will move the CMSSW codebase in that direction.
Project Proposal: URL
Project Reports: GSoC 2020 Archive
Mentors: Vassil Vassilev, David Lange
Google Summer of Code Student 2020
email: r.intval@gmail.com
Education: Mathematics and Computer Science, Voronezh State University, Russia
project:
Extend clad to compute Jacobians
In mathematics and computer algebra, automatic differentiation (AD) is a
set of techniques to numerically evaluate the derivative of a function
specified by a computer program. Automatic differentiation is an alternative
technique to Symbolic differentiation and Numerical differentiation (the
method of finite differences). CLAD is based on Clang which will provide the
necessary facilities for code transformation. The AD library is able to
differentiate non-trivial functions, to find a partial derivative for trivial
cases and has good unit test coverage.
Currently, clad does not provide an easy way to compute Jacobians.
Project Proposal: URL
Project Reports: Poster
Mentors: Vassil Vassilev, Alexander Penev