L. Wu | TU Delft Repository

I Choose You

Automated Hyperparameter Tuning for Deep Learning-based Side-channel Analysis

Journal article (2024) - Lichao Wu, Guilherme Perin, Stjepan Picek

Today, the deep learning-based side-channel analysis represents a widely researched topic, with numerous results indicating the advantages of such an approach. Indeed, breaking protected implementations while not requiring complex feature selection made deep learning a preferred option for profiling side-channel analysis. Still, this does not mean it is trivial to mount a successful deep learning-based side-channel analysis. One of the biggest challenges is to find optimal hyperparameters for neural networks resulting in powerful side-channel attacks. This work proposes an automated way for deep learning hyperparameter tuning based on Bayesian optimization. We build a custom framework denoted AutoSCA supporting machine learning and side-channel metrics. Our experimental analysis shows that our framework performs well regardless of the dataset, leakage model, or neural network type. We find several neural network architectures outperforming state-of-the-art attacks. Finally, while not considered a powerful option, we observe that neural networks obtained via random search can perform well, indicating that the publicly available datasets are relatively easy to break. ...

Ablation Analysis for Multi-device Deep Learning-based Physical Side-channel Analysis

Journal article (2024) - Lichao Wu, Yoo-Seung Won, Dirmanto Jap, Guilherme Perin, Shivam Bhasin, Stjepan Picek

The use of deep learning-based side-channel analysis is an effective way of performing profiling attacks on power and electromagnetic leakages, even against targets protected with countermeasures. While many research articles have reported successful results, they typically focus on profiling and attacking a single device, assuming that leakages are similar between devices of the same type. However, this assumption is not always realistic due to variations in hardware and measurement setups, creating what is known as the portability problem. Profiling multiple devices has been proposed as a solution, but obtaining access to these devices may pose a challenge for attackers. This article proposes a new approach to overcome the portability problem by introducing a neural network layer assessment methodology based on the ablation paradigm. This methodology evaluates the sensitivity and resilience of each layer, providing valuable knowledge to create a Multiple Device Model from Single Device (MDMSD). Specifically, it involves ablating a specific neural network section and performing recovery training. As a result, the profiling model, trained initially on a single device, can be generalized to leakage traces measured from various devices. By addressing the portability problem through a single device, practical side-channel attacks could be more accessible and effective for attackers. ...

SoK

Deep Learning-based Physical Side-channel Analysis

Journal article (2023) - Stjepan Picek, Guilherme Perin, Luca Mariot, Lichao Wu, Lejla Batina

Side-channel attacks represent a realistic and serious threat to the security of embedded devices for already almost three decades. A variety of attacks and targets they can be applied to have been introduced, and while the area of side-channel attacks and their mitigation is very well-researched, it is yet to be consolidated. Deep learning-based side-channel attacks entered the field in recent years with the promise of more competitive performance and enlarged attackers' capabilities compared to other techniques. At the same time, the new attacks bring new challenges and complexities to the domain, making the systematization of knowledge (SoK) even more critical.We first dissect deep learning-based side-channel attacks according to the different phases they can be used in and map those phases to the efforts conducted so far in the domain. For each phase, we identify the weaknesses and challenges that triggered the known open problems. We also connect the attacks to the threat models and evaluate their advantages and drawbacks. Finally, we provide a number of recommendations to be followed in deep learning-based side-channel attacks. ...

AutoPOI

Automated points of interest selection for side-channel analysis

Journal article (2023) - Mick G.D. Remmerswaal, Lichao Wu, Sébastien Tiran, Nele Mentens

Template attacks (TAs) are one of the most powerful side-channel analysis (SCA) attacks. The success of such attacks relies on the effectiveness of the profiling model in modeling the leakage information. A crucial step for TA is to select relevant features from the measured traces, often called points of interest (POIs), to extract the leakage information. Previous research indicates that properly selecting the input leaking features could significantly increase the attack performance. However, due to the presence of SCA countermeasures and advancements in technology nodes, such features become increasingly difficult to extract with conventional approaches such as principle component analysis (PCA) and the Sum Of Squared pairwise T-difference-based method (SOST). This work proposes a framework, AutoPOI, based on proximal policy optimization to automatically find, select and scale down features. The input raw features are first grouped into small regions. The best candidates selected by the framework are further scaled down with an online-optimized dimensionality reduction neural network. Finally, the framework rewards the performance of these features with the results of TA. Based on the experimental results, the proposed framework can extract features automatically that lead to comparable state-of-the-art performance on several commonly used datasets. ...

The Circle of DL-SCA: Improving Deep Learning-based Side-channel Analysis

Doctoral thesis (2023) - L. Wu

For almost three decades, side-channel analysis has represented a realistic and severe threat to embedded devices' security. As a well-known and influential class of implementation attacks, side-channel analysis has been applied against cryptographic implementations, processors, communication systems, and, more recently, machine learning models. Two reasons make these attacks powerful. First, they take advantage of unintended information leakages that the security designer could easily forget. These leakages can be conveyed from various sources, such as power consumption, electromagnetic emanations, time, temperature, and acoustic and photonic emissions. Protection from such leakages can be challenging and costly. Second, such attacks do not require complicated and expensive equipment or frameworks. Commonly, an adversary uses an oscilloscope to monitor some of those side-channel leakages, then performs statistical analysis to find the relation between the leakages and the actual executed values, and finally uses these relations to recover secret information. Fortunately, hardware and software developers are prepared for these attack methods. Several protection mechanisms, also called side-channel countermeasures, have been implemented to increase the security assurance of their devices. However, this cat-and-mouse game is now changed because of the rising of artificial intelligence in side-channel analysis. Some countermeasures, resilient to conventional methods, can be easily bypassed by machine learning. This thesis aims to improve the capability of side-channel analysis using deep learning techniques. Specifically, we propose approaches covering complete deep learning-based side-channel analysis procedures (we denote them as "The Circle of DL-SCA"). Before applying the leakages to launch actual attacks, in chapter 2, we offer strategies for improving leakage's ''quality'' from various aspects. Then, in chapter 3, the study focuses on critical deep learning hyperparameters and proposes two automated neural architecture search methods that release the burden of the evaluation in tuning the neural network. Besides developing new attack strategies, we also focus on the existing attack methods and investigate how to enhance their efficiency, robustness, and explainability. Chapter 4 introduces an efficient learning scheme that can reduce the required training traces. Then, we develop an attack evaluation metric that can reliably reflect the performance and robustness of the model. In chapter 5, we create a novel methodology to evaluate the influence of noise and countermeasures on deep-learning models, then apply the research outcomes to design low-cost deep-learning resilient countermeasures. Our research outcomes will push the designers to develop more secure devices. The feed-forward loop between us (researchers) and designers can eventually make the electronic world more secure. ...

For almost three decades, side-channel analysis has represented a realistic and severe threat to embedded devices' security. As a well-known and influential class of implementation attacks, side-channel analysis has been applied against cryptographic implementations, processors, communication systems, and, more recently, machine learning models. Two reasons make these attacks powerful. First, they take advantage of unintended information leakages that the security designer could easily forget. These leakages can be conveyed from various sources, such as power consumption, electromagnetic emanations, time, temperature, and acoustic and photonic emissions. Protection from such leakages can be challenging and costly. Second, such attacks do not require complicated and expensive equipment or frameworks. Commonly, an adversary uses an oscilloscope to monitor some of those side-channel leakages, then performs statistical analysis to find the relation between the leakages and the actual executed values, and finally uses these relations to recover secret information. Fortunately, hardware and software developers are prepared for these attack methods. Several protection mechanisms, also called side-channel countermeasures, have been implemented to increase the security assurance of their devices. However, this cat-and-mouse game is now changed because of the rising of artificial intelligence in side-channel analysis. Some countermeasures, resilient to conventional methods, can be easily bypassed by machine learning. This thesis aims to improve the capability of side-channel analysis using deep learning techniques. Specifically, we propose approaches covering complete deep learning-based side-channel analysis procedures (we denote them as "The Circle of DL-SCA"). Before applying the leakages to launch actual attacks, in chapter 2, we offer strategies for improving leakage's ''quality'' from various aspects. Then, in chapter 3, the study focuses on critical deep learning hyperparameters and proposes two automated neural architecture search methods that release the burden of the evaluation in tuning the neural network. Besides developing new attack strategies, we also focus on the existing attack methods and investigate how to enhance their efficiency, robustness, and explainability. Chapter 4 introduces an efficient learning scheme that can reduce the required training traces. Then, we develop an attack evaluation metric that can reliably reflect the performance and robustness of the model. In chapter 5, we create a novel methodology to evaluate the influence of noise and countermeasures on deep-learning models, then apply the research outcomes to design low-cost deep-learning resilient countermeasures. Our research outcomes will push the designers to develop more secure devices. The feed-forward loop between us (researchers) and designers can eventually make the electronic world more secure.

No (good) loss no gain

Systematic evaluation of loss functions in deep learning-based side-channel analysis

Journal article (2023) - Maikel Kerkhof, Lichao Wu, Guilherme Perin, Stjepan Picek

Deep learning is a powerful direction for profiling side-channel analysis as it can break targets protected with countermeasures even with a relatively small number of attack traces. Still, it is necessary to conduct hyperparameter tuning to reach strong attack performance, which can be far from trivial. Besides many options stemming from the machine learning domain, recent years also brought neural network elements specially designed for side-channel analysis. The loss function, which calculates the error or loss between the actual and desired output, is one of the most important neural network elements. The resulting loss values guide the weights update associated with the connections between the neurons or filters of the deep learning neural network. Unfortunately, despite being a highly relevant hyperparameter, there are no systematic comparisons among different loss functions regarding their effectiveness in side-channel attacks. This work provides a detailed study of the efficiency of different loss functions in the SCA context. We evaluate five loss functions commonly used in machine learning and three loss functions specifically designed for SCA. Our results show that an SCA-specific loss function (called CER) performs very well and outperforms other loss functions in most evaluated settings. Still, categorical cross-entropy represents a good option, especially considering the variety of neural network architectures. ...

The Need for Speed

A Fast Guessing Entropy Calculation for Deep Learning-Based SCA

Journal article (2023) - Guilherme Perin, Lichao Wu, Stjepan Picek

The adoption of deep neural networks for profiling side-channel attacks opened new perspectives for leakage detection. Recent publications showed that cryptographic implementations featuring different countermeasures could be broken without feature selection or trace preprocessing. This success comes with a high price: an extensive hyperparameter search to find optimal deep learning models. As deep learning models usually suffer from overfitting due to their high fitting capacity, it is crucial to avoid over-training regimes, which require a correct number of epochs. For that, early stopping is employed as an efficient regularization method that requires a consistent validation metric. Although guessing entropy is a highly informative metric for profiling side-channel attacks, it is time-consuming, especially if computed for all epochs during training, and the number of validation traces is significantly large. This paper shows that guessing entropy can be efficiently computed during training by reducing the number of validation traces without affecting the efficiency of early stopping decisions. Our solution significantly speeds up the process, impacting the performance of the hyperparameter search and overall profiling attack. Our fast guessing entropy calculation is up to 16× faster, resulting in more hyperparameter tuning experiments and allowing security evaluators to find more efficient deep learning models. ...

Label Correlation in Deep Learning-Based Side-Channel Analysis

Journal article (2023) - Lichao Wu, Léo Weissbart, Marina Krcek, Huimin Li, Guilherme Perin, Lejla Batina, Stjepan Picek

The efficiency of the profiling side-channel analysis can be significantly improved with machine learning techniques. Although powerful, a fundamental machine learning limitation of being data-hungry received little attention in the side-channel community. In practice, the maximum number of leakage traces that evaluators/attackers can obtain is constrained by the scheme requirements or the limited accessibility of the target. Even worse, various countermeasures in modern devices increase the conditions on the profiling size to break the target. This work demonstrates a practical approach to dealing with the lack of profiling traces. Instead of learning from a one-hot encoded label, transferring the labels to their distribution can significantly speed up the convergence of guessing entropy. By studying the relationship between all possible key candidates, we propose a new metric, denoted Label Correlation (LC), to evaluate the generalization ability of the profiling model. We validate LC with two common use cases: early stopping and network architecture search, and the results indicate its superior performance. ...

The Best of Two Worlds

Deep Learning-assisted Template Attack

Journal article (2022) - Lichao Wu, Guilherme Perin, Stjepan Picek

In the last decade, machine learning-based side-channel attacks have become a standard option when investigating profiling side-channel attacks. At the same time, the previous state-of-the-art technique, template attack, started losing its importance and was more considered a baseline to compare against. As such, most of the results reported that machine learning (and especially deep learning) could significantly outperform the template attack. Nevertheless, the template attack still has certain advantages even compared to deep learning. The most significant one is that it has only a few hyperparameters to tune, making it easier to use. We take another look at the template attack, and we devise a feature engineering phase allowing the template attack to compete or even outperform state-of-the-art deep learning-based side-channel attacks. More precisely, with a novel distance metric customized for side-channel analysis, we show how a deep learning technique called similarity learning can be used to find highly efficient embeddings of input data with one-epoch training, which can then be fed into the template attack resulting in powerful attacks. ...

Gambling for Success

The Lottery Ticket Hypothesis in Deep Learning-Based Side-Channel Analysis

Book chapter (2022) - Guilherme Perin, Lichao Wu, Stjepan Picek

Deep learning-based side-channel analysis (SCA) represents a strong approach for profiling attacks. Still, this does not mean it is trivial to find neural networks that perform well for any setting. Based on the developed neural network architectures, we can distinguish between small neural networks that are easier to tune and less prone to overfitting but could have insufficient capacity to model the data. On the other hand, large neural networks have sufficient capacity but can overfit and are more difficult to tune. This brings an interesting trade-off between simplicity and performance. This work proposes to use a pruning strategy and recently proposed Lottery Ticket Hypothesis (LTH) as an efficient method to tune deep neural networks for profiling SCA. Pruning provides a regularization effect on deep neural networks and reduces the overfitting posed by overparameterized models. We demonstrate that we can find pruned neural networks that perform on the level of larger networks, where we manage to reduce the number of weights by more than 90% on average. This way, pruning and LTH approaches become alternatives to costly and difficult hyperparameter tuning in profiling SCA. Our analysis is conducted over different masked AES datasets and for different neural network topologies. Our results indicate that pruning, and more specifically LTH, can result in competitive deep learning models. ...

Reinforcement Learning-Based Design of Side-Channel Countermeasures

Conference paper (2022) - Jorai Rijsdijk, Lichao Wu, Guilherme Perin

Deep learning-based side-channel attacks are capable of breaking targets protected with countermeasures. The constant progress in the last few years makes the attacks more powerful, requiring fewer traces to break a target. Unfortunately, to protect against such attacks, we still rely solely on methods developed to protect against generic attacks. The works considering the protection perspective are few and usually based on the adversarial examples concepts, which are not always easy to translate to real-world hardware implementations. In this work, we ask whether we can develop combinations of countermeasures that protect against side-channel attacks. We consider several widely adopted hiding countermeasures and use the reinforcement learning paradigm to design specific countermeasures that show resilience against deep learning-based side-channel attacks. Our results show that it is possible to significantly enhance the target resilience to a point where deep learning-based attacks cannot obtain secret information. At the same time, we consider the cost of implementing such countermeasures to balance security and implementation costs. The optimal countermeasure combinations can serve as development guidelines for real-world hardware/software-based protection schemes. ...

Focus is Key to Success

A Focal Loss Function for Deep Learning-Based Side-Channel Analysis

Conference paper (2022) - Maikel Kerkhof, Lichao Wu, Guilherme Perin, Stjepan Picek

The deep learning-based side-channel analysis represents one of the most powerful side-channel attack approaches. Thanks to its capability in dealing with raw features and countermeasures, it becomes the de facto standard approach for the SCA community. The recent works significantly improved the deep learning-based attacks from various perspectives, like hyperparameter tuning, design guidelines, or custom neural network architecture elements. Still, insufficient attention has been given to the core of the learning process - the loss function. This paper analyzes the limitations of the existing loss functions and then proposes a novel side-channel analysis-optimized loss function: Focal Loss Ratio (FLR), to cope with the identified drawbacks observed in other loss functions. To validate our design, we 1) conduct a thorough experimental study considering various scenarios (datasets, leakage models, neural network architectures) and 2) compare with other loss functions used in the deep learning-based side-channel analysis (both “traditional” ones and those designed for side-channel analysis). Our results show that FLR loss outperforms other loss functions in various conditions while not having computational overhead like some recent loss function proposals. ...

On the Evaluation of Deep Learning-Based Side-Channel Analysis

Conference paper (2022) - Lichao Wu, Guilherme Perin, Stjepan Picek

Deep learning-based side-channel analysis is rapidly positioning itself as a de-facto standard for the most powerful profiling side-channel analysis.The results from the last few years show that deep learning techniques can efficiently break targets that are even protected with countermeasures. While there are constant improvements in making the deep learning-based attacks more powerful, little is done on evaluating the attacks’ performance. Indeed, how the evaluation process is done today is not different from what was done more than a decade ago from the perspective of evaluation metrics. This paper considers how to evaluate deep learning-based side-channel analysis and whether the commonly used approaches give the best results. To that end, we consider different summary statistics and the influence of algorithmic randomness on the stability of profiling models. Our results show that besides commonly used metrics like guessing entropy, one should also show the standard deviation results to assess the attack performance properly. Even more importantly, using the arithmetic mean for guessing entropy does not yield the best results, and instead, a median value should be used. ...

Exploring Feature Selection Scenarios for Deep Learning-based Side-channel Analysis

Journal article (2022) - Guilherme Perin, Lichao Wu, Stjepan Picek

One of the main promoted advantages of deep learning in profiling side-channel analysis is the possibility of skipping the feature engineering process. Despite that, most recent publications consider feature selection as the attacked interval from the side-channel measurements is pre-selected. This is similar to the worst-case security assumptions in security evaluations when the random secret shares (e.g., mask shares) are known during the profiling phase: an evaluator can identify points of interest locations and efficiently trim the trace interval. To broadly understand how feature selection impacts the performance of deep learning-based profiling attacks, this paper investigates three different feature selection scenarios that could be realistically used in practical security evaluations. The scenarios range from the minimum possible number of features (worst-case security assumptions) to the whole available traces. Our results emphasize that deep neural networks as profiling models show successful key recovery independently of explored feature selection scenarios against first-order masked software implementations of AES-128. First, we show that feature selection with the worst-case security assumptions results in optimal profiling models that are highly dependent on the number of features and signal-to-noise ratio levels. Second, we demonstrate that attacking raw side-channel measurements with small deep neural networks also provides optimal models, that shortens the gap between worst-case security evaluations and online (realistic) profiling attacks. In all explored feature selection scenarios, the hyperparameter search always indicates a successful model with up to eight hidden layers for MLPs and CNNs, suggesting that complex models are not required for the considered datasets. Our results demonstrate the key recovery with less than ten attack traces for all datasets for at least one of the feature selection scenarios. Additionally, in several cases, we can recover the target key with a single attack trace. ...

One of the main promoted advantages of deep learning in profiling side-channel analysis is the possibility of skipping the feature engineering process. Despite that, most recent publications consider feature selection as the attacked interval from the side-channel measurements is pre-selected. This is similar to the worst-case security assumptions in security evaluations when the random secret shares (e.g., mask shares) are known during the profiling phase: an evaluator can identify points of interest locations and efficiently trim the trace interval. To broadly understand how feature selection impacts the performance of deep learning-based profiling attacks, this paper investigates three different feature selection scenarios that could be realistically used in practical security evaluations. The scenarios range from the minimum possible number of features (worst-case security assumptions) to the whole available traces. Our results emphasize that deep neural networks as profiling models show successful key recovery independently of explored feature selection scenarios against first-order masked software implementations of AES-128. First, we show that feature selection with the worst-case security assumptions results in optimal profiling models that are highly dependent on the number of features and signal-to-noise ratio levels. Second, we demonstrate that attacking raw side-channel measurements with small deep neural networks also provides optimal models, that shortens the gap between worst-case security evaluations and online (realistic) profiling attacks. In all explored feature selection scenarios, the hyperparameter search always indicates a successful model with up to eight hidden layers for MLPs and CNNs, suggesting that complex models are not required for the considered datasets. Our results demonstrate the key recovery with less than ten attack traces for all datasets for at least one of the feature selection scenarios. Additionally, in several cases, we can recover the target key with a single attack trace.

Reinforcement learning for hyperparameter tuning in deep learning-based side-channel analysis

Journal article (2021) - Jorai Rijsdijk, Lichao Wu, Guilherme Perin, Stjepan Picek

Deep learning represents a powerful set of techniques for profiling side-channel analysis. The results in the last few years show that neural network architectures like multilayer perceptron and convolutional neural networks give strong attack performance where it is possible to break targets protected with various coun-termeasures. Considering that deep learning techniques commonly have a plethora of hyperparameters to tune, it is clear that such top attack results can come with a high price in preparing the attack. This is especially problematic as the side-channel community commonly uses random search or grid search techniques to look for the best hyperparameters. In this paper, we propose to use reinforcement learning to tune the convolutional neural network hyperparameters. In our framework, we investigate the Q-Learning paradigm and develop two reward functions that use side-channel metrics. We mount an investigation on three commonly used datasets and two leakage models where the results show that reinforcement learning can find convolutional neural networks exhibiting top performance while having small numbers of trainable parameters. We note that our approach is automated and can be easily adapted to different datasets. Several of our newly developed architectures outperform the current state-of-the-art results. Finally, we make our source code publicly available.¹. ...

On the Importance of Pooling Layer Tuning for Profiling Side-Channel Analysis

Conference paper (2021) - Lichao Wu, Guilherme Perin

In recent years, the advent of deep neural networks opened new perspectives for security evaluations with side-channel analysis. Profiling attacks now benefit from capabilities offered by convolutional neural networks, such as dimensionality reduction and the inherent ability to reduce the trace desynchronization effects. These neural networks contain at least three types of layers: convolutional, pooling, and dense layers. Although the definition of pooling layers causes a large impact on neural network performance, a study on pooling hyperparameters effect on side-channel analysis is still not provided in the academic community. This paper provides extensive experimental results to demonstrate how pooling layer types and pooling stride and size affect the profiling attack performance with convolutional neural networks. Additionally, we demonstrate that pooling hyperparameters can be larger than usually used in related works and still keep good performance for profiling attacks on specific datasets. ...

Remove Some Noise

On Pre-processing of Side-channel Measurements with Autoencoders

Journal article (2020) - Lichao Wu, Stjepan Picek

In the profiled side-channel analysis, deep learning-based techniques proved to be very successful even when attacking targets protected with countermeasures. Still, there is no guarantee that deep learning attacks will always succeed. Various countermeasures make attacks significantly more complex, and such countermeasures can be further combined to make the attacks even more challenging. An intuitive solution to improve the performance of attacks would be to reduce the effect of countermeasures.
This paper investigates whether we can consider certain types of hiding countermeasures as noise and then use a deep learning technique called the denoising autoencoder to remove that noise. We conduct a detailed analysis of six different types of noise and countermeasures separately or combined and show that denoising autoencoder improves the attack performance significantly. ...

A fast characterization method for semi-invasive fault injection attacks

Conference paper (2020) - Lichao Wu, Gerard Ribera, Noemie Beringuier-Boher, Stjepan Picek

Semi-invasive fault injection attacks are powerful techniques well-known by attackers and secure embedded system designers. When performing such attacks, the selection of the fault injection parameters is of utmost importance and usually based on the experience of the attacker. Surprisingly, there exists no formal and general approach to characterize the target behavior under attack. In this work, we present a novel methodology to perform a fast characterization of the fault injection impact on a target, depending on the possible attack parameters. We experimentally show our methodology to be a successful one when targeting different algorithms such as DES and AES encryption and then extend to the full characterization with the help of deep learning. Finally, we show how the characterization results are transferable between different targets. ...