Ioana Ifrim, Vassil Vassilev, David J Lange, GPU Accelerated Automatic Differentiation With Clad arXiv preprint arXiv:2203.06139 (2022).
- Automatic Differentiation (AD) is instrumental for science and industry.
It is a tool to evaluate the derivative of a function specified through a
computer program. The range of AD application domain spans from Machine
Learning to Robotics to High Energy Physics. Computing gradients with
the help of AD is guaranteed to be more precise than the numerical
alternative and have a low, constant factor more arithmetical operations
compared to the original function. Moreover, AD applications to domain
problems typically are computationally bound. They are often limited by
the computational requirements of high-dimensional parameters and thus can
benefit from parallel implementations on graphics processing units (GPUs).
Clad aims to enable differential analysis for C/C++ and CUDA and is a
compiler-assisted AD tool available both as a compiler extension and in ROOT.
Moreover, Clad works as a plugin extending the Clang compiler; as a plugin
extending the interactive interpreter Cling; and as a Jupyter kernel
extension based on xeus-cling. We demonstrate the advantages of parallel
gradient computations on GPUs with Clad. We explain how to bring forth a new
layer of optimization and a proportional speed up by extending Clad to
support CUDA. The gradients of well-behaved C++ functions can be
automatically executed on a GPU. The library can be easily integrated into
existing frameworks or used interactively. Furthermore, we demonstrate the
achieved application performance improvements, including (~10x) in ROOT
histogram fitting and corresponding performance gains from offloading to GPUs.
Marco Foco, Max Rietmann, Vassil Vassilev, Michael Wong, et. al., P2072R0: Differentiable programming for C++ (2020).
- Derivatives are vital a wide variety of computing applications, including
numerical optimization, solution of nonlinear equations, sensitivity
analysis, and nonlinear inverse problems. Virtually every process could be
described with a mathematical function. A mathematical function can be
thought of as an association between elements from different sets.
Derivatives can track how a varying quantity depends on another quantity,
for example how the position of a planet as the time varies. Derivatives and
gradients allows us to explore the properties of a function and thus the
described process as a whole. Also, gradients are an essential component in
gradient-based optimization methods, that have become more and more important
in recent years, in particular with its application training of (Deep) Neural
Networks. Derivatives can be computed numerically, but unfortunately the
finite differences methods are problematic due to the approximation operated
and the finite precision of floating point values used, and so the
implementation of the method faces precision and round off problems which
can affect the overall precision of the computation. This problem becomes
worse with higher order derivatives.
Javier López-Gómez, Javier Fernández, David del Rio Astorga, Vassil Vassilev, et. al., Relaxing the one definition rule in interpreted C++ (2020).
- Most implementations of the C++ programming language generate binary
executable code. However, interpreted execution of C++ sources has its own
use cases as the Cling interpreter from CERN’s ROOT project has shown. Some
limitations are derived from the ODR (One Definition Rule) that rules out
multiple definitions of entities within a single translation unit (TU). ODR
is there to ensure uniform view of a given C++ entity across translation
units. Ensuring uniform view of C++ entities helps when producing ABI
compatible binaries. Interpreting C++ presumes a single ever-growing
translation unit that define away some of the ODR use-cases. Therefore, it
may well be desirable to relax the ODR and, consequently, to support the
ability of developers to override any existing definition for a given
declaration. This approach is especially well-suited for iterative
prototyping. In this paper, we extend Cling, a Clang/LLVM …
Martin Vassilev, Vassil Vassilev, Alexander Penev, IDD – A Platform Enabling Differential Debugging Cybernetics and Information Technologies 20 53-67 (2020).
- Debugging is a very time consuming task which is not well supported by
existing tools. The existing methods do not provide tools enabling optimal
developers’ productivity when debugging regressions in complex systems. In
this paper we describe a possible solution aiding differential debugging.
The differential debugging technique performs analysis of the regressed
system and identifying the cause of the unexpected behavior by comparing to
a previous version of the same system. The prototype, idd, inspects two
versions of the executable – a baseline and a regressed version. The
interactive debugging session runs side by side both executables and allows
to examine and to compare various internal states. The architecture can work
with multiple information sources comparing data from different tools. We
also show how idd can detect performance regressions using information from
third-party performance facilities. We illustrate how in practice we can
quickly discover regressions in large systems such as the clang compiler.
Yuka Takahashi, Oksana Shadura, Vassil Vassilev, Migrating large codebases to C++ Modules Journal of Physics: Conference Series 1525 012051 (2020).
- ROOT has several features which interact with libraries and require implicit
header inclusion. This can be triggered by reading or writing data on disk,
or user actions at the prompt. Often, the headers are immutable, and
reparsing is redundant. C++ Modules are designed to minimize the reparsing
of the same header content by providing an efficient on-disk representation
of C++ Code. ROOT has released a C++ Modules-aware technology preview, which
intends to become the default for the ROOT 6.20 release.
Vassil Vassilev, David Lange, Malik Shahzad Muzzafar, Mircho Rodozov, et. al., C++ Modules in ROOT and Beyond arXiv preprint arXiv:2004.06507 (2020).
- C++ Modules come in C++ 20 to fix the long-standing build scalability
problems in the language. They provide an io-efficient, on-disk
representation capable to reduce build times and peak memory usage. ROOT
employs the C++ modules technology further in the ROOT dictionary system to
improve its performance and reduce the memory footprint.ROOT with C++ Modules
was released as a technology preview in fall 2018, after intensive
development during the last few years. The current state is ready for
production, however, there is still room for performance optimizations. In
this talk, we show the roadmap for making the technology default in ROOT.
We demonstrate a global module indexing optimization which allows reducing
the memory footprint dramatically for many workflows. We will report user
feedback on the migration to ROOT with C++ Modules.
Kim Albertsson Brann, Sitong An, Vassil Vassilev, Enric Tejedor Saavedra, et. al., arXiv: Software Challenges For HL-LHC Data Analysis (2020).
- The high energy physics community is discussing where investment is needed
to prepare software for the HL-LHC and its unprecedented challenges. The ROOT
project is one of the central software players in high energy physics since
decades. From its experience and expectations, the ROOT team has distilled a
comprehensive set of areas that should see research and development in the
context of data analysis software, for making best use of HL-LHC’s physics
potential. This work shows what these areas could be, why the ROOT team
believes investing in them is needed, which gains are expected, and where
related work is ongoing. It can serve as an indication for future research
proposals and cooperations.
Vassil Vassilev, Aleksandr Efremov, Oksana Shadura, Automatic Differentiation in ROOT arXiv preprint arXiv:2004.04435 (2020).
- In mathematics and computer algebra, automatic differentiation (AD) is
a set of techniques to evaluate the derivative of a function specified by a
computer program. AD exploits the fact that every computer program, no matter
how complicated, executes a sequence of elementary arithmetic operations
(addition, subtraction, multiplication, division, etc.), elementary functions
(exp, log, sin, cos, etc.) and control flow statements. AD takes source code
of a function as input and produces source code of the derived function. By
applying the chain rule repeatedly to these operations, derivatives of
arbitrary order can be computed automatically, accurately to working
precision, and using at most a small constant factor more arithmetic
operations than the original program.This paper presents AD techniques
available in ROOT, supported by Cling, to produce derivatives of arbitrary
C/C++ functions through implementing source code transformation and
employing the chain rule of differential calculus in both forward mode and
reverse mode. We explain its current integration for gradient computation in
TFormula. We demonstrate the correctness and performance improvements in
ROOT’s fitting algorithms.
Oksana Shadura, Brian Paul Bockelman, Vassil Vassilev, Extending ROOT through Modules EPJ Web of Conferences 214 05011 (2019).
- The ROOT software framework is foundational for the HEP ecosystem, providing
multiple capabilities such as I/O, a C++ interpreter, GUI, and math
libraries. It uses object-oriented concepts and build-time components to
layer between them. We believe that a new layering formalism will benefit
the ROOT user community. We present the modularization strategy for ROOT
which aims to build upon the existing source components, making available
the dependencies and other metadata outside of the build system, and allow
post-install additions on top of existing installation as well as in the ROOT
runtime environment. Components can be grouped into packages and made
available from repositories in order to provide a post-install step of
missing packages. This feature implements a mechanism for the more
comprehensive software ecosystem and makes it available even from a minimal
ROOT installation. As part of …
Oksana Shadura, Vassil Vassilev, Brian Paul Bockelman, Continuous Performance Benchmarking Framework for ROOT EPJ Web of Conferences 214 05003 (2019).
- Foundational software libraries such as ROOT are under intense pressure to
avoid software regression, including performance regressions. Continuous
performance benchmarking, as a part of continuous integration and other code
quality testing, is an industry best-practice to understand how the performance
of a software product evolves. We present a framework, built from industry best
practices and tools, to help to understand ROOT code performance and monitor
the efficiency of the code for several processor architectures. It additionally
allows historical performance measurements for ROOT I/O, vectorization and
parallelization sub-systems.
Y Takahashi, V Vassilev, O Shadura, R Isemann, Optimizing Frameworks’ Performance Using C++ Modules Aware ROOT EPJ Web of Conferences 214 02011 (2019).
- ROOT is a data analysis framework broadly used in and outside of High Energy
Physics (HEP). Since HEP software frameworks always strive for performance
improvements, ROOT was extended with experimental support of runtime C++
Modules. C++ Modules are designed to improve the performance of C++ code
parsing. C++ Modules offers a promising way to improve ROOT’s runtime
performance by saving the C++ header parsing time which happens during ROOT
runtime. This paper presents the results and challenges of integrating C++
Modules into ROOT.
V Vassilev, Optimizing ROOT’s Performance Using C++ Modules Journal of Physics: Conference Series 898 1742-6596 (2016).
- ROOT comes with a C++ compliant interpreter cling. Cling needs to understand
the content of the libraries in order to interact with them. Exposing the
full shared library descriptors to the interpreter at runtime translates
into increased memory footprint. ROOT’s exploratory programming concepts
allow implicit and explicit runtime shared library loading. It requires the
interpreter to load the library descriptor. Re-parsing of descriptors’
content has a noticeable effect on the runtime performance. Present
state-of-art lazy parsing technique brings the runtime performance to
reasonable levels but proves to be fragile and can introduce correctness
issues. An elegant solution is to load information from the descriptor lazily
and in a non-recursive way.The LLVM community advances its C++ Modules
technology providing an io-efficient, on-disk representation capable to
reduce build times and peak memory usage. The feature …
V Vassilev, Native Language Integrated Queries with CppLINQ in C++ Journal of Physics: Conference Series 608 012030 (2015).
- Programming language evolution brought to us the domain-specific languages
(DSL). They proved to be very useful for expressing specific concepts,
turning into a vital ingredient even for general-purpose frameworks.
Supporting declarative DSLs (such as SQL) in imperative languages (such as
C++) can happen in the manner of language integrated query (LINQ).
V Vassilev, M Vassilev, A Penev, L Moneta, et. al., Clad—automatic differentiation using Clang and LLVM Journal of Physics: Conference Series 608 012055 (2015).
- Differentiation is ubiquitous in high energy physics, for instance in
minimization algorithms and statistical analysis, in detector alignment and
calibration, and in theory. Automatic differentiation (AD) avoids well-known
limitations in round-offs and speed, which symbolic and numerical
differentiation suffer from, by transforming the source code of functions. We
will present how AD can be used to compute the gradient of multi-variate
functions and functor objects. We will explain approaches to implement an AD
tool. We will show how LLVM, Clang and Cling (ROOT’s C++ 11 interpreter)
simplifies creation of such a tool. We describe how the tool could be
integrated within any framework. We will demonstrate a simple proof-of-concept
prototype, called Clad, which is able to generate n-th order derivatives of
C++ functions and other language constructs. We also demonstrate how Clad can
offload laborious computations …
Violeta Ilieva, Vassil Vassilev, clad - Automatic Differentiation with Clang (2013).
- Differentiation is the process of finding a derivative, which measures
how a function changes as its input changes. Derivatives can be evaluated
with machine precision accuracy through a method called automatic
differentiation. Unlike other methods for differentiation, including
numerical and symbolic, automatic differentiation yields exact derivatives
even of complicated functions at relatively low processing and storage costs.
V Vassilev, Ph Canal, A Naumann, P Russo, Cling–the new interactive interpreter for ROOT 6 (2012).
- Cling is an interactive C++ interpreter, built on top of Clang and LLVM
compiler infrastructure. Like its predecessor Cint, Cling realizes the
read-print-evaluate-loop concept, in order to leverage rapid application
development. Implemented as a small extension to LLVM and Clang, the
interpreter reuses their strengths such as the praised concise and expressive
compiler diagnostics. We show how to match the interpreter concept to the
compiler library and generalize common set of requirements for building up
an interactive interpreter. We reason the design and implementation
decisions as solution to the challenge of implementing interpreter behaviour
as an extension of the compiler library. We present the new features, eg
how C++11 will come to Cling and how CINT-specific extensions are being
adopted. We clarify the state of integration in the ROOT framework and the
induced change set. We explain how ROOT …