GSoC 2024 | Learning Quantum Representations of Classical High-Energy Physics Data with Contrastive Learning

Sanya Nanda | Sept 22, 2024 | 14 min read

Google Summer of Code @ ML4Sci

This year Google celebrates its 20th anniversary of Google Summer of Code (GSoC) with 1,220 Contributors, writing code for 195 open-source mentoring organizations. As a GSoC 2024 contributor, I worked in Machine Learning for Science (ML4Sci), an open-source organization that brings together modern machine learning techniques and applies them to cutting-edge problems in STEM. Over the summer, I worked on Quantum Machine Learning applied on High Energy Physics data (QMLHEP) to contrastively train models to output embeddings that can be used for other downstream tasks like classification.

Here is the code repository: Quantum_SSL_for_HEP_Sanya_Nanda

Project

Learning quantum representations of classical high-energy physics data with contrastive learning:

Implemented multiple trainable embedding functions to encode HEP data onto contrastive learning models.

Developed computer vision, graph-based and quantum hybrid models for contrastive learning framework.

Experimented with different approaches for embedding functions and contrastive losses for training.

Demonstrated an effort to prove quantum advantage using Quantum ML-based hybrid model.

Benchmarked the trained embeddings of classical and quantum models using downstream tasks.

Background

LHC: Large Hadron Collider

At LHC, scientists are looking into the unknown and probing the most fundamental questions about our Universe, like: "What is the Universe really made of? what forces act within it?". The Large Hadron Collider (LHC) is the world's largest and most powerful particle physics accelerator at CERN in Geneva, Switzerland. It features a 27-kilometer ring of superconducting magnets, designed to accelerate particles to the speed of light and collide them to explore fundamental physics. The collider has been instrumental in significant discoveries, including the Higgs boson. In total, the LHC is comprised of seven separate experiments, few can be seen in Figure 1.

LHC Image — Figure 1: The Large Hadron Collider

CMS: Compact Muon Solenoid

CMS is one of the detectors located at CERN's LHC, designed to investigate a broad range of physics. The CMS detector is massive and has a cylindrical onion-like structure with multiple layers of detectors. These layers allow CMS to capture detailed photographs of the particle collisions occurring at LHC. To top it all, CMS measures the properties of well-known particles with unprecedented precision and is on the lookout for completely new and unknown phenomenas.

CMS Image — Figure 2: CMS Detector and it's internal structure

What is High Energy Physics at CMS?

Bending Particles

A powerful solenoid magnet bends the trajectories of charged particles as they fly outwards from the collision point. This helps to identify the charge of the particle and measure its momentum.

Identifying Tracks

The path of the bent charged particles is calculated with a very high precision by using a silicon tracker consisting of many electronic sensors arranged in concentric layers. When a charged particle flies through the Tracker layer, it interacts electromagnetically with the silicon and produces a hit. These hits are then joined together to identify the track of the traversing particle. The Tracker layer can be seen in Figure 2.

Measuring Energy

The energies of the various particles produced in each collision are crucial to understanding what occurred at the collision point. This information is collected from the two calorimeters in CMS as marked in Figure 2.

Electromagnetic Calorimeter (ECAL): is the inner layer of the two and measures the energy of electrons and photons by stopping them completely.
Hadron Calorimeter (HCAL): is the outer layer and it stops Hadrons, which are composite particles made up of quarks and gluons that fly through ECAL.

Data: Quark Gluon Dataset from CMS

The CERN CMS Open Data Portal makes simulated data available from experiments, which was used to derive the Quark-Gluon dataset by S. Gleyzer et al [1]. The goal of this project is to discriminate between quark-initiated and gluon-initiated jets in the mentioned dataset. The dataset consists of 933206 images with 3-channels of 125x125 shape, representing equal number of quarks and gluons. The three channels in the images correspond to measurements from components of the CMS detector as discussed above: Track, ECAL and HCAL. Mean of all the images of this dataset are depicted in Figure 3-4.

Gluon Image — Figure 3: Mean of images of Gluon for all 3 channels: Tracks, ECAL, HCAL respectively

Figure 4: Mean of images of Quark for all 3 channels: Tracks, ECAL, HCAL respectively

Quarks: Fundamental particles that make up protons and neutrons.

Gluons: Force carrier particles that mediate the strong force between quarks.

More Details about this dataset, along with Quark-Gluon properties and other datasets used in this project can be found in my Mid Term GSoC Blog.

Data Preprocessing

Data Preprocessing is of high significance to ensure good quality data for model training. In Machine Learning cycle, this is the most important step and it ensures a better performing model.

For the computer vision models used in contrastive learning framework, the 3 channels of the Quark-Gluon dataset were first analysed from a physics perspective as explained above and then the images were preprocessed as shown in this code. Some of the preprocessing techniques used were color jittering, gaussian blur and z-scale normalisation. A new channel was introduced by superimposing the preprocessed channels 1-3. Following, Figure 5 is a sample image with the 4 channels and Figure 6 is the overall mean, it is evident that the 4th channel has a wider mean compared to 3rd channel due to the superimposition.

Figure 5: The 4 channles of quark-gluon dataset

Figure 6: Mean across the 4 channels of quark gluon

Next, pairs/views were created from the preprocessed data to pass as input to the contrastive learning framework. Figure 7 depicts one such sample.

Figure 7: Quark Gluon pair/view (from wandb experiment run)

The logic behind creating views or pairs for contrastive learning involves creation of positive views. Positive views are created by taking an image and creating a pair or view by augmenting it. This is done for both quark (0) and gluon (1). Views created by using quarks and it's augmented version is called a similar pair and assigned new label 1 and the similarly created views of gluons are called dissimilar and labelled as 0. Views from the same sample and views from different samples are considered postive and negative, respectively. While training the model, we consider every pair other than the given sample as negative. The context of positive and negative pair is created only in terms of the loss function and the model doesn't have idea about the view concept. It is the loss function that nudges the model in the direction of clustering similar samples together. More on this is detailed in the contrastive learning section.

Figure 8: Quark-Gluon positive-negative views

Similarly, for graph-based models the data was preprocessed and views were created as explained above. Figure 8 is a sample of views used for graph-based contrastive learning. The weights used in the graph are physics informed. There were 12500 graphs or jets with 8 features per node depicting physics based attributes, as follows:

p_T: Traverse Momentum is a measure of momentum of a particle perpendicular to the beamline or collision axis.
y: Rapidity is a measure of how the particle's velocity compares to the speed of light in the direction of motion along the beamline.
phi: Azimuthal Angle is the angle in the transverse plane, ranging from 0 to 2pi. It describes the direction of a particle's momentum perpendicular to the beamline.
m: Rest mass of the particle.
E: Total energy of the particle, both kinetic and potential energy.
px: The momentum component along the x-axis.
py: The momentum component along the y-axis.
pz: The momentum component along the z-axis.

There are two classes namely Quark and Gluon. In a given graph, there are 8 nodes or particle IDs, 56 edges and average node degree is 7. The graphs are undirected, reflecting that interactions between particles are bidirectional. Furthermore, graph augmentations were employed while creating similar and dissimilar pairs. Some graph augmentation techniques applied were node dropping, edge perturbation and feature masking [2]. The augmentations help the model in picking qualities like robustness, expressivity, etc.

Contrastive Learning

Representation learning involves extracting meaningful, compressed representations of data for various downstream tasks like classification, transfer learning etc. Contrastive learning, a form of representation learning, quantifies similarity or dissimilarity between data elements by contrasting positive/similar and negative/dissimilar pairs in the feature space, facilitating effective representation learning.

Objective:

The primary objective is to minimize the distance between positive pairs while maximizing the distance between negative pairs in the learned embedding space. This creates a representation where similar data points are clustered together, and dissimilar ones are well-separated. This enables the model to capture the underlying structure and semantic relationships within the data without requiring explicit labels.

Contrastive Loss Functions:

1. Contrastive Pair Loss

The contrastive loss function operates on pairs of samples, encouraging the model to bring similar pairs closer in the embedding space while pushing dissimilar pairs apart.

The contrastive loss for a pair of samples \( (x_1, x_2) \) with label \( y \) is defined as:

\[ \mathcal{L} = y \cdot D^2 + (1 - y) \cdot \max(0, m - D)^2 \]

Where:

\( D \) is the Euclidean distance between the embeddings of \( x_1 \) and \( x_2 \).
\( y = 1 \) if \( x_1 \) and \( x_2 \) are similar, else \( y = 0 \).
\( m \) is the margin that defines the minimum distance for dissimilar pairs.

2. InfoNCE Loss

InfoNCE is a type of contrastive loss that leverages multiple negative samples within a batch to improve representation learning.

The InfoNCE loss for a positive pair \( (i, j) \) is defined as:

\[ \mathcal{L}_{\text{InfoNCE}} = -\log \frac{\exp(\text{sim}(z_i, z_j)/\tau)}{\sum_{k=1}^{2N} \mathbb{1}_{k \neq i} \exp(\text{sim}(z_i, z_k)/\tau)} \]

Where:

\( z_i \) and \( z_j \) are the embeddings of the positive pair.
\( \text{sim}(a, b) \) is a similarity function, typically cosine similarity.
\( \tau \) is a temperature parameter that scales the logits.
\( N \) is the batch size, and \( 2N \) accounts for all positive and negative pairs in the batch.
\( \mathbb{1}_{k \neq i} \) is an indicator function that excludes the positive pair from the denominator.

3. NT-Xent Loss (Normalized Temperature-scaled Cross Entropy Loss)

NT-Xent is a specific formulation of the InfoNCE loss, it emphasizes the normalization of the loss over positive and negative pairs. It's essentially same as InfoNCE but incorporates batch-wise negatives and ensures symmetry in the loss computation. The formula is same as above. More about the variety of contrastive loss functions can be found here.

Model Architecture:

Following are the four major components that make up the contrastive learning framework [3]:

1. Data Augmentation

It generates multiple views of the same data point, which are treated as positive pairs. The diversity introduced by augmentation helps the model learn invariant features.

2. Encoder Network

The encoder network transforms raw input data into high-dimensional embeddings or feature vectors. Its primary objective is to capture the underlying structure of the data, facilitating effective comparison between data points.
In our case, for MNIST and Quark-Gluon image datasets, CNN and Resnet encoders were used. For the graph views of Quark-Gluon data, GNN and its quantum hybrid were used as the encoder, returning embeddings in a high dimensionality space upon training.

3. Projection Head

The projection head maps the high-dimensional embeddings produced by the encoder into a lower-dimensional space where the contrastive loss is applied. This separation allows the encoder to learn features beneficial for downstream tasks, while the projection head focuses on the contrastive objective. Numerous quantum versions of the projection head were experimented in the study by introducing quantum layers.

4. Loss Function

The loss function quantifies how well the model distinguishes between positive and negative pairs. It guides the optimization process by providing gradients that adjust the model parameters to improve performance. Some examples used in this project were explained above.

Encoder Networks & Projection Head:

Graph Neural Networks optimise transformations on all attributes of graph; on a node, edge and global level, at the same time preserving symmetries. The GNN encoder used has GATConv layers, followed by batch normalization to stabilize training by normalizing the embeddings. Furthermore, residual connections were added between layers to help prevent gradient vanishing and improve training. As the projection head, mean and max pooling were used to capture a richer set of information from the nodes, to improve performance in graph classification tasks.

Convolutional Neural Network follows the same framework and is detailed in the Mid Term GSoC Blog. ResNet used is a further enhancement to the computer vision models used. Experimentation on quantum-based projection head was conducted thoroughly, which is detailed in a section below.

Training the model:

Data Augmentation: For each input data point, apply different augmentations to create a positive pair.
Encoding: Pass augmented views through the encoder network to obtain embeddings.
Projection: Use the projection head to map the embeddings into the latent space where contrastive loss is applied.
Loss Computation: Calculate the contrastive loss using the positive pair and a set of negative samples.
Backpropagation: Update the encoder and projection head parameters to minimize the loss.
Iteration: Repeat the process for multiple epochs until the representations converge.

Model Evaluation

After the training was complete, all models were evalauted as detailed in this section. All experiments are tracked using weights and biases functionalities. Below, we will look into results from classical GNN encoder network on Quark-Gluon views (GNN wandb report) and classical CNN encoder on MNIST 3-8 views (CNN wandb report) using contrastive pair loss. Similarly, all the models were evaluated and their wandb reports and results are logged in the next section on benchmarking.

Evaluation 1: Learning History

Learning history is logged across the epochs while training the model on train and validation datasets. Figure 9 shows the training and validation learning curve for classical GNN encoder network when running on Quark-Gluon graph views.

Figure 9: Learning History of GNN while training on Quark-Gluon graph views

Evaluation 2: Test Embeddings plot using TSNE

The embeddings projected by the CNN encoder for MNIST 3-8 dataset are represented in Figure 10. The embeddings are in higher dimensional space and were reduced to 3 dimensions in the plot below by using TSNE dimensionality reduction technique.

Figure 10: Test Embeddings of MNIST 3-8 using TSNE

Evaluation 3: Test Predicitons

In Figure 11, test prediction on a positive sample is plotted along with embedding vector in high dimensionality. It is evident that the close numbers appear in the same positions in the embedding, indicating that the two samples of 3 are plotted nearer to each other. Whereas, in Figure 12, it is clear that the embeddings of 3 and 8 are distant from each other.

Evaluation 4: Downstream Task:- Linear Classification Test

The accuracy of the generated embeddings can be tested by using them for downstream tasks and thereby evaluating the task for its effectiveness. For the GNN encoder, the linear classification test is implemented and the efficiency is measured using confusion matrix (Figure 13) and AUC-ROC curve (Figure 14).

Figure 13: Confusion Matrix on Quark-Gluon

High Energy Physics datasets are generally complicated to work with and an AUC above 0.7 is considered good.

Figure 14: Confusion Matrix on Quark-Gluon

On the contrary, CNN encoder on MNIST performs well in less epochs due to the simplicity of the dataset, it's AUC is nearing to 1 and the confusion matrix (Figure 15) shows that the model makes almost no mistakes. The same CNN when applied on Quark-Gluon images, doesn't perform well.

Figure 15: Confusion Matrix on MNIST 3-8

Quantum Hybrid Models

Quantum machine learning combines quantum computing and classical machine learning to enhance data processing and model performance by leveraging the three main quantum mechanics principles: Interference, Superposition and Entanglement. Quantum machine learning aims to solve complex problems faster and more efficiently than classical algorithms as qubits, the quantum analogue of classical bits, can store more information at a given time due to superposition. Following are the quantum components used in the model architecture used in this study:

Quantum Projection Head:
The classical projection head after the encoder layers is replaced with a quantum circuit-based projection head using quantum layers. This potentially captures more complex relationships in the embedding space.
Quantum Layer:
This is the layer where the parameterized quantum circuit (PQC) is applied using Pennylane. It processes the embeddings from the classical encoder layers.
Quantum Circuit:
A simple quantum circuit is defined to apply quantum gate-based rotations between the qubits in the Hilbert space, and then expectation values are measured.

Parameterized Quantum Circuits (PQCs) are fundamental building blocks in quantum machine learning. PQCs consist of quantum gates with tunable parameters, optimized during ML training. Following is an overview of the primary PQCs used in this project:

Figure 16, shows a quantum circuit with Ry rotations followed by entanglement.

Figure 16: Quantum Circuit 1:- Ry Rotations and Entanglement

Figure 17, shows a quantum circuit with angle embedding template from Pennylane followed by entanglement.

Figure 17: Quantum Circuit 2:- Angle embedding and Entanglement

Figure 18, shows a quantum circuit with amplitude embedding followed by entanglement.

Figure 18: Quantum Circuit 3:- Amplitude embedding and Entanglement

Aforementioned are the main circuits used in experiments, the number of qubits and layers were played around with and some other samples can also be found in Mid Term GSoC Blog.

Quantum Fidelity

Fidelity is a quantum equivalent of a similarity score between the quantum states. It was used in one of the quantum-hybrid experiments along with the previously defined loss functions to observe its effect on embeddings.

Benchmarking

Table 1, illustrates the result of CNN encoder on MNIST and Quark-Gluon image pairs. First round of experiments were conducted on different pairs of MNIST numbers that look alike; 0-1, 3-8, 9-6. MNIST data was used for quick experimentation and validation of the approach being used. The results on MNIST showed the utility of the framework. Good results were achieved using the classical computer vision based contrastive learning model as shown in Table 1. Additionally, respective wandb reports for each run can be viewed for a detailed inspection of the results. The model worked moderately on the Quark-Gluon data, so the next experiments were done on this dataset to log improvements from the base.

Dataset	Model	Validation Loss	Validation Accuracy	WandB Report
0-1 MNIST	CNN Encoder + contrastive pair	0.000911	0.9997	Report 1
3-8 MNIST	CNN Encoder + contrastive pair	0.004080	0.9977	Report 2
9-6 MNIST	CNN Encoder + contrastive pair	0.002580	0.9994	Report 3
Quark-Gluon	CNN Encoder + contrastive pair	0.4921	0.5617	No Report

Table 1: CNN encoder on MNIST and Quark-Gluon

Table 2, shows the result of different classical encoders on Quark-Gluon dataset. The first row is the same as the last row of Table 1 and it shows the performance of CNN encoder. The next row shows the result from ResNet18 which are promising and can be enhanced further. Less data samples were used with ResNet as it is a huge model and was naturally time-consuming. Finally, GNNs were explored and they gave good results with an AUC nearing to 0.8, which is considered as a good model and even so for HEP dataset due to their inherent complexities. For more details of the GNN Encoder run, checkout its Report 4.

Model	Test Accuracy (%)	AUC
CNN Encoder	56.17%	0.52
ResNet18 Encoder	60.02%	0.5416
GNN Encoder	73.28%	0.7984

Table 2: Different classical encoders on Quark-Gluon

Table 3, compares the classical vs quantum GNN models on Quark-Gluon dataset as the GNN model performed the best in Table 2, naturally because of the sparse nature of particle cloud dataset. The quantum hybrids were tried with most of the models, but only the results with GNN are shown below for easy comparison. It is evident that quantum hybrid models have a comparable performance to classical model, even though they are bound by parameters like the number of qubits that can be used as of date. The following report 5 shows that quantum-hybrids work better in 10 epochs and in report 6 GNN takes over with more epochs.

Model	Test Accuracy (%)	AUC	WandB Report
GNN Encoder	73.28%	0.7984	Report 5, Report 6
GNN Encoder + Quantum projection head (QC1)	66.93%	0.7287	Report 5, Report 6
GNN Encoder + (QC2) + Fidelity	60.37%	0.6448	Report 7
GNN Encoder + QC3	67.02%	0.7285	Report 8

Table 3: Classical and Quantum GNN on Quark-Gluon

Conclusion

It can be concluded that quantum and classical contrastive learning works effectively on MNIST as well as HEP datasets like Quark-Gluon. It is noteworthy that a simple CNN encoder based model is almost always correct when working on MNIST dataset in less number of iterations. The same model doesn't perform as well on the HEP data. Upgrading the computer vision model to ResNet18 encoder shows improvement on the HEP dataset. Moreover, implementing GNN based encoders by converting HEP particle cloud data to a graph shows considerable improvement in accuracy on the downstream tasks. The quantum hybrid models show comparable and sometimes slightly better performance in terms of AUC on the datasets used. In conclusion, all the experiments conducted show the viability of using respresentation learning from both classical and quantum perspectives on HEP dataset to generate embeddings that can be meaningfully used in downstream tasks. Studies like these conducted at ML4Sci make sure that LHC is equipped for it's next wave of experiments that will generate data in huge numbers, requiring ML models to make sense of everything more efficiently. A research of the kind of LHC increases our understanding of what exists and can eventually spark new technologies that change the world we live in.

Future Scope

Currently, the experiments were done with 12.5k data points while the complete dataset has 933k data points. The complete Quark-Gluon dataset is quite big in comparison and consequently experiments on the complete dataset is required to learn more and observe the effect on the current results. Additionally, existing models can be tuned further for better performance.
Experiment with more types of loss functions and architectures is crucial. There are many more frameworks in the literature which weren't tested like MoCo, BYOL etc. Experiments with a larger variety of quantum circuits found in literature would be beneficial and trying fully quantum models vouch as a good next step.

Acknowledgment

I would love to start by acknowledging all the unwavering support showered throughout the program by my mentors and co-mentees. I want to extend my deepest gratitude to my mentors and Professors Sergei Gleyzer, KC Kong, Katia Matcheva, Konstantin Matchev, Myeonghun Park, and Gopal Ramesh Dahale; who have guided me with invaluable insights. Their constant encouragement has been a source of inspiration and motivation for me. I am truly grateful for the time and effort they have invested in nurturing my skills, broadening my horizons and deepening my knowledge.

To my co-mentees, Amey Bhatuse and Duy Do Lee, I deeply appreciate the camaraderie, shared learning, and mutual support we've offered each other. Together, we have navigated challenges, celebrated successes, and grown stronger. To all the other ML4Sci GSoC contributors and their amazing work! It was always a delight to get on a call with everyone and learn from everyone's experiences. For me, the best part was being part of such a dedicated community working towards quantum computing. The global perspective of the team helped me understand different points of view and approaches to solve the problem at hand.

Kudos to the GSoC organizers and leads for such a phenomenal job at managing the program in its entirety and bringing together mentors and mentees for a collaboration of such a huge scale!

References

[1] M. Andrews, J. Alison, S. An, B. Burkle, S. Gleyzer, M. Narain, M. Paulini, B. Poczos, E. Usai, End-to-end jet classification of quarks and gluons with the CMS Open Data, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Volume 977, 2020, 164304, ISSN 0168–9002,.
[2] You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z. and Shen, Y., 2020. Graph contrastive learning with augmentations. Advances in neural information processing systems, 33, pp.5812-5823.
[3] Le-Khac, P.H., Healy, G. and Smeaton, A.F., 2020. Contrastive representation learning: A framework and review. Ieee Access, 8, pp.193907-193934.
[4] A. Hammad, Kyoungchul Kong, Myeonghun Park and Soyoung Shim, Quantum Metric Learning for New Physics Searches at the LHC, 2023
[5] Jaderberg, B., Anderson, L.W., Xie, W., Albanie, S., Kiffner, M. and Jaksch, D., 2022. Quantum self-supervised learning. Quantum Science and Technology, 7(3), p.035005.
[6] Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D. and Makedon, F., 2020. A survey on contrastive self-supervised learning. Technologies, 9(1), p.2.
[7] Liu, Y., Jin, M., Pan, S., Zhou, C., Zheng, Y., Xia, F. and Philip, S.Y., 2022. Graph self-supervised learning: A survey. IEEE transactions on knowledge and data engineering, 35(6), pp.5879–5900.

Index