Clad as First-Class Citizen in LibTorch
This project will design, implement, benchmark, and integrate a
proof-of-concept that uses Clad (compiler-based automatic differentiation)
as a first-class gradient engine in LibTorch (the C++ API of PyTorch). The
goal is to demonstrate how ROOT users can run high-performance, pure-C++
machine-learning training and inference pipelines, without relying on
Python. The project will result in a working prototype that integrates
Clad-generated backward routines into LibTorch via
torch::autograd::Function or custom ATen operators.
Recent efforts have extended the ROOT framework with modern machine-learning capabilities. In particular, a ROOT Users Workshop 2025 contribution by Meyer-Conde et al. demonstrates the use of LibTorch directly inside ROOT for gravitational-wave data analysis [1]. Their “ROOT+” prototype library augments ROOT with advanced features such as complex tensor arithmetic on CPU/GPU and modern I/O mechanisms (HTTP, Kafka), while relying on LibTorch for ML training and inference. In practice, this enables ROOT to load and execute neural networks (via ONNX or LibTorch) entirely in C++, and to combine them seamlessly with ROOT’s data-processing tools such as RDataFrame and TMVA, all within a single environment.
In parallel, recent work in the Compiler Research community has demonstrated that Clad-generated gradients can match and even outperform PyTorch autograd on CPU when carefully optimized [2]. These results motivate a deeper exploration of compiler-driven automatic differentiation as a backend for machine learning frameworks. Building on both efforts, this project will culminate in a ROOT integration demo (for example, a simplified gravitational-wave analysis workflow) and a reproducible benchmarking suite comparing Clad-based gradients with PyTorch autograd for realistic HEP and GW workloads.
This project is expected to deliver tangible performance and usability benefits for machine-learning workflows in ROOT. By offloading gradient computation to Clad’s compiler-generated routines, meaningful speedups are expected for CPU-bound training workloads; prior results report speedups over PyTorch autograd on CPU [2]. This makes the approach particularly attractive for offline HEP and gravitational-wave analyses, where CPU efficiency is often a limiting factor. In addition, the project will enable fully native C++ machine-learning workflows in ROOT, allowing users to define, train, and evaluate models without Python dependencies and to integrate ML tightly with existing C++ analysis code, ROOT I/O, and data pipelines. The Clad-enhanced LibTorch backend will naturally complement ROOT’s existing ML ecosystem including TMVA, SOFIE, ONNX-based inference, and RDataFrame providing a flexible “best-of-both-worlds” solution that combines modern deep-learning frameworks with ROOT’s mature analysis infrastructure. Beyond the immediate prototype, this work will establish a solid foundation for future research on compiler-driven optimizations such as kernel fusion, reduced memory traffic, and eventual GPU acceleration.
Task ideas and expected results
- Create a small C++ demo where a simple neural network is defined
(e.g. MLP) and use Clad to generate its derivative functions. Integrate
this with LibTorch by wrapping the Clad-generated gradient code as a
custom
torch::autograd::Functionor operator. This follows the strategy outlined in the Clad-PyTorch project. The result is a model that uses LibTorch tensors for forward, but Clad’s code for backward. - Measure training (forward+backward) performance on CPU for representative tasks (e.g. MNIST or a simple GW signal classification). Compare Clad-derived gradients vs PyTorch autograd. Focus on performance: optimize memory layout and avoid dynamic allocations to maximize throughput.
- Adapt the working prototype into the ROOT framework. For example,
incorporate it into a ROOT macro or plugin so that one can run C++ ML code
under
root.exeor in PyROOT. Provide examples using ROOT’s data structures (TTrees, RDataFrame) feeding into the Clad-empowered model. Investigate loading pretrained models (via ONNX or TorchScript) and whether Clad can backpropagate through them.