# **SCA Strikes Back** # Reverse Engineering Neural Network Architectures using Side Channels Batina, Lejla; Bhasin, Shivam; Jap, Dirmanto; Picek, Stjepan 10.1109/MDAT.2021.3128436 **Publication date** **Document Version** Final published version Published in IEEE Design and Test Citation (APA) Batina, L., Bhasin, S., Jap, D., & Picek, S. (2022). SCA Strikes Back: Reverse Engineering Neural Network Architectures using Side Channels. *IEEE Design and Test*, *39*(4), 7-14. Article 9615240. https://doi.org/10.1109/MDAT.2021.3128436 #### Important note To cite this publication, please use the final published version (if applicable). Please check the document version above. Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. # Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. # SCA Strikes Back: Reverse-Engineering Neural Network Architectures Using Side Channels # Lejla Batina Radboud University, 6500 GL Nijmegen, The Netherlands #### **Shivam Bhasin and Dirmanto Jap** Nanyang Technological University, Singapore 637553 # Stjepan Picek Delft University of Technology, 2628 XE Delft, The Netherlands #### Editor's notes: This article proposes an attack that steals a neural network using side-channel attacks. —Jeyavijayan "JV" Rajendran, Texas A&M University Machine Learning, and, more recently, deep learning have become mainstream research and development directions due to their unquestionable practicality and effectiveness. The ever increasing computational capabilities of modern computers and huge amounts of available data result in ever more complex and effective machine learning architectures. Deep learning algorithms gain popularity in edge devices such as sensors or actuators, as they are indispensable for real-time processing. Consequently, there is an increasing interest in deploying neural networks on low-power processors found in always-on systems like ARM Cortex-M microcon- Digital Object Identifier 10.1109/MDAT.2021.3128436 Date of publication:16 November 2021; date of current version: 22 June 2022. trollers. It is expected that by 2024, the number of edge-based artificial intelligence chips to be doubled [1]. With increasing number of design strategies and machine learning algorithms in use, fine-tuning algorithm's hyperparameters are emerging as one of the main challenges. The design and training of a machine learning model is a challenging procedure and an expensive one, so a well-trained model has a monetary value. For instance, the cost of training a machine learning model can be more than \$1 million [2]. We are also witnessing an increase in intellectual property (IP) model strategies. In cases when optimized neural networks are of commercial interest, their details must be kept undisclosed. IP thefts of trained machine learning models through side-channel attacks are becoming a major threat. Setting aside privacy issues, obtaining valuable information from neural network architectures can help acquire trade secrets from the competition, leading to losses in competitive advantage. Despite the advantages of using machine learning-enabled edge devices, it becomes harder to ensure the confidentiality of the developed model as the devices operate in an environment where physical side-channels analysis becomes a real threat. This work was originally published in USENIX Security 2019 [3]. Since then, several new results have been published, inspiring this line of research on side-channel and fault attacks on neural networks. Prior to this work, reverse-engineering attacks on neural networks mostly relied on observing the outputs of the neural network and training a substitute model or exploiting specific design choices. This work shows that it is possible to recover the layout of unknown neural networks by exploiting the available physical (side-channel) information. Our approach does not need access to training data and allows for neural network recovery by feeding known random inputs. # Background ### Machine learning Machine learning, in general, is based on the idea that a system can "learn" from examples by extracting patterns or discovering information without human intervention [4]. There are many different machine learning algorithms. Today, the most popular algorithms come from the neural network family and are based on the deep learning paradigm. #### Side-channel analysis Side-channel analysis (SCA) exploits the vulnerabilities of implementations. It was first demonstrated on cryptographic implementations [5]. SCA shows that even for theoretically secure algorithms, observing the unintentional physical or side-channel leakages (such as timing, power, electromagnetic (EM) emanation) from their implementations could lead to the potential recovery of secret information. Next, we describe some of the most common methods used in SCA, which we will also use in our attacks. We discuss timing analysis, simple power analysis (SPA), and differential power analysis (DPA). - Timing analysis: When the algorithm is implemented, different operations lead to different timing execution. If the execution time depends on sensitive parameters, it leaks sensitive information to the adversary. In our attack, we exploit the unique timing behavior of various activation functions. - Simple power analysis: In SPA, one learns sensitive information from one or a few traces, with basic techniques like a visual inspection supported by signal processing. In the context of neural network reverse engineering, SPA can determine the number of neurons and even the number of layers in some cases. - 3. Differential power analysis: The attack applies statistical techniques for the secret recovery. The general idea is to test or identify statistical dependencies between the physical leakage and the hypothetical intermediate value (secret-dependent). For example, the adversary could compute a (nonlinear) function between the known value and hypothetical secret. The adversary then applies a leakage model on the output, which is generally device-dependent (e.g., the Hamming weight (HW) and Hamming distance model). Statistical tests are then used to compare different hypothetical values (influenced by the different hypotheses of the secret) with the physical leakage. The most commonly used statistical method is correlation, as used, for example, in correlation power analysis (CPA). It computes the Pearson correlation between each hypothetical output and the physical leakages. The hypothetical secret that leads to the highest absolute correlation value is then deemed the right guess. We use CPA to recover the secret weights as well as to determine the layer boundaries. #### Model recovery techniques overview This section provides a brief introduction to the machine learning model recovery attack in embedded devices using EM side-channels. Interested readers are referred to [3] for extensive technical details of the attack process. #### Threat model The threat model for the attack assumes an adversary interested in recovering the architecture (hyperparameters) and parameters of the target 8 IEEE Design&Test $<sup>^{1}</sup>$ Note that despite the attack name (power analysis), it could also be used on another side-channel leakage, such as the EM emanation. model. The target is a pretrained neural network model executed on an embedded device while running inference. The adversary can query the model with known/chosen inputs and passively observe side-channel information corresponding to the executed inference. For the following experiment, we observe EM side-channel signatures, thus requiring physical access to the device. While most model extraction attacks need access to the original training data set (or similar data set), the attack proposed in the following does not need access to training data. As shown later, an adversary can feed random known inputs to extract the model. To be as generic as possible, we work with randomly chosen real numbers as inputs. Finally, the target model is assumed to have no side-channel countermeasures implemented, which is (unfortunately) true almost everywhere in practice today. #### Experimental setup The experimental setup comprises of the target embedded device (e.g., 8-bit Atmel ATmega328P and 32-bit ARM Cortex-M3), executing the model, an EM probe to monitor side-channel activity, a digital oscilloscope to capture measured side-channel activity, and an optional pre-amplifier to boost the measured signal. The side-channel activity is captured using the Lecroy WaveRunner 610zi oscilloscope using an RF-U 5-2 near-field EM probe from Langer and a 30-dB preamplification. We use available handshaking signals like the start/stop of computation to synchronize the measurements. Each measurement (or trace) corresponds to one randomly chosen input. Every trace is composed of several samples (or points), where the number of samples can go in the range of millions when measuring a complete inference. Since we use a microcontroller, the neurons are executed sequentially. The attack targets leakage corresponding to the loading of sensitive parameters in the data bus, which is known to leak with the HW model, that is, proportional to the number of bits equal to "1" in the sensitive variable [5]. The target models are implemented in C language and pretrained offline. #### Recovering neural network parameters First, we implement a simple multilayer perceptron (MLP) as a toy example. MLP is a feedforward neural network that maps sets of inputs onto sets of appropriate outputs. It consists of multiple layers of nodes in a directed graph. Each node in a layer is connected to every node in the subsequent layer, and each connection is associated with a certain weight parameter. The implemented architecture consists of one hidden layer with six neurons. Each neuron implements the input multiplication followed by the Sigmoid activation function. The execution sequence as captured on the side-channel trace is shown in Figure 1. Notice that the multiplication and activation are clearly distinguishable (separated by the red line for readability). 1) Recovering activation function: Activation functions are the main nonlinear component of a neural network [4]. As the activation function is clearly distinguishable on the captured EM trace, one can easily measure the timing execution from the EM trace and be precise to a nanosecond scale. We observed that all activation functions have a unique timing behavior, which leaks information about the function used. We analyze the timing behavior of four commonly used activation functions: ReLU, sigmoid, tanh, and Softmax. The timing behavior for 2,000 random inputs is shown in Figure 2 and allows distinguishing each activation function. To recover activation functions for the whole network, an adversary feeds random inputs and records the execution timing for each activation function in each neuron. For a Figure 1. Observing pattern and timing of multiplication and activation function. Figure 2. Timing behavior for different activation functions. (a) ReLU. (b) Sigmoid. (c) Tanh. (d) Softmax. Figure 3. Recovery of weights in a neural network. (a) First byte recovery (sign and 7-bit exponent). (b) Second byte recovery (LSB exponent and mantissa). modern oscilloscope, side-channel activity for all the neurons can be captured at once for one input, and the same traces can be used to recover activation functions for the whole network. 2) Recovering neural network weights: The weights of a pretrained model make the core of the IP. In many cases, the architecture might be known publicly, but it is the weights resulting from detailed training that distinguishes a good model from a bad one. We target weight recovery with CPA. It is assumed that the adversary can synchronize the weight multiplication from one input to another, using widely available techniques in the side-channel literature [5]. The attack targets multiplication of secret weight w with *i*th known input $x_i$ , resulting in product $p_i$ . The leakage occurs when p is computed and stored back in the memory. While the implementation of multiplication can vary (schoolbook, software-optimized, hardware-accelerated), the storage of p will leak, and thus it is easier to target it. On the microcontroller, writing p to memory follows the HW leakage. Thus, the CPA computes Pearson correlation p[t,HW(p)]for all hypotheses of w, corresponding to a set of inputs x. Here, t represents the set of side-channel traces captured corresponding to inputs x. Given a sufficient number of traces, the correlation for correct w will stand out from other wrong hypotheses. This is analogous to secret key recovery in cryptography, where the HW leakage of a key-dependent intermediate value is targeted for known plaintext to find the secret key with the highest correlation. Still, there is an important difference. In cryptography, we require exact key recovery, but here, some precision errors can be tolerated. The underlying implementation treats weights in IEEE 754 representation, where each weight is represented in 32 bits. The most significant bit represents the sign, the next eight bits contain the exponent, and the remaining 23 are reserved for the mantissa. We recover them as four independent bytes in four independent attacks. The traces remain the same as they all correspond to the same multiplication, and only our hypothesis changes when moving from one byte to another. Of course, the first two bytes are more important, comprising of sign, exponent, and most significant mantissa bits. The attack on the first two bytes is shown in Figure 3. The black line represents the correlation with the correct weight and the red lines for incorrect weight. The y-axis represents absolute correlation and the number of traces (or corresponding inputs queried) on the x-axis. The attack is considered successful when the black line depicts a higher correlation over the red line in a conclusive manner. With around 200 traces, the correct weight can be identified. The same attack must be repeated on each multiplication to recover other weights. # Recovering neural network architecture Once the weights and activation functions are recovered, only the architecture remains to be recovered. This is performed using SPA, which relies on visual inspection of side-channel measurements to learn sensitive information. 10 IEEE Design&Test Figure 4. SPA on hidden layers. (a) (6). (b) (6, 5). (c) (6, 5, 5). We noticed that the neurons have a very distinct side-channel signature in a sequential execution setup like ours. Consider Figure 4, which shows the execution signature of three neural networks with (6), (6, 5), and (6, 5, 5) architectures. Here (a, b, c) represents a feedforward neural network with three hidden layers and a, b, c neurons in each layer, respectively, starting from the input layer. As shown in Figure 4, the number of neurons can be easily recovered with SPA. Layer boundaries are not clear by SPA, and CPA is used for that purpose. Here, CPA exploits the fact that neurons in the first layer will show a higher correlation with the inputs than the second and later layers, allowing the identification of neurons in the first layer. The boundaries of different layers can be determined similarly. #### **Evaluation** A combination of previously discussed techniques recovers the full neural network. The recovery is performed layer by layer, and neuron by neuron. The recovery of the previous layer allows the adversary to compute inputs to the next layer and continue the attack to recover the weights and structure. The methodology to reverse engineer a neural network is displayed in Figure 5. This methodology scales linearly with the size of the neural network. # Reverse-engineering MLP We consider an MLP with (50, 30, 20, 50) architecture that was previously used for side-channel applications in [6]. This neural network is implemented in ARM Cortex-M3 as it allows testing our approach with considerably larger neural network models than discussed up to now. All the activation functions are ReLU except the output layer, which uses Softmax. The measurement trace is shown in Figure 6(a). The data set is DPAcontest v4 with 50 samples and 75,000 measurements where the first 50,000 measurements are used for training and the rest for testing. The data set has nine classes. The four layers and their boundaries are clearly distinguishable. Moving forward, we show the measurement for one neuron in the third layer in Figure 6(b), where 20 multiplication peaks and ReLU peaks are visible. We performed the neural network model extraction with the previously described approach. The recovered model has an accuracy of 0.6087, compared to 0.6090 for the original model. Figure 5. Methodology to reverse engineer a neural network. Figure 6. Reverse-engineering neural networks on ARM CortexM3 (a) showing full execution trace allowing identification of each layer and (b) showing a zoom-in at one neuron execution in the third layer with expected 20 multiplication peaks followed by a ReLU execution peak. (a) Full EM trace for MLP (50, 30, 20, 50). (b) EM trace for one neurons in the third layer of MLP (50, 30, 20, 50). #### Reverse-engineering CNN We finally extend the proposed attack methodology to convolutional neural networks (CNNs). CNNs are inspired by the biological processes of animals' visual cortex to process data with 2-D convolutions. CNNs are mainly composed of convolutional layers, pooling layers, and fully connected layers. Convolutional layers are linear layers that share weights across space. Pooling layers are nonlinear layers that reduce the spatial size to limit the number of neurons. Fully connected layers are layers where every neuron is connected with all the neurons in the neighborhood layer. The target is the CMSIS-NN implementation on ARM Cortex-M3 with the same measurement setup as in previous experiments. As input, we target the CIFAR-10 data set that consists of $60,000~32\times32$ color images in ten classes. The CNN consists of three convolutional layers, three max-pooling layers, and one fully connected layer. We choose as target the multiplication operation from the input with the weight, similar as before. For this experiment, the operations on real values are performed using fixed-point arithmetic. For the pooling layer, once the weights in the convolution part are recovered, the output can be calculated. Since the max-pooling layer is based on the following conditional instruction, conditional(if(a > max) max = a), it is straightforward to differentiate it from the average pooling that has summation and division operations. This technique is then repeated to reverse engineer any number of convolutional and pooling layers. Finally, the fully connected layer is recovered in the same way as done for MLP. In our experiment, the original accuracy of the CNN equals 0.7847, and the accuracy of the recovered CNN is 0.7811. # Perspectives and long-term impact Physical attacks on machine learning and deep learning implementation have received growing interest from the research community. While this work [3] is one of the first works highlighting physical channel vulnerabilities on deep learning, it was validated on microcontrollers only. Nevertheless, it has motivated several directions for further research. A natural question arises regarding the feasibility of such attacks on other hardware platforms. Dubey et al. [7] presented the first practical model recovery attack on FPGA platforms, followed by a proposal to integrate masking as a countermeasure. Recently, it was also shown that model recovery attacks could also be performed remotely on multitenant FPGA [8], thus relaxing the requirement for physical access. Attacks on neural networks not only threaten the recovery of confidential models, 12 but the sensitive input can also be recovered with a similar approach [9]. Chmielewski and Weissbart [10] managed to reverse engineer implemented neural networks on Nvidia Jetson Nano, a module computer embedding a Tegra X1 SoC combining an ARM Cortex-A57 CPU and a 128-core GPU within a Maxwell architecture by using simple EM analysis. Furthermore, a side-channel in a server setting has threatened cloud-based model execution, as demonstrated by Wei et al. [11]. A comprehensive survey of SCA-based model recovery attacks is presented in [12]. The threat of model extraction attacks on neural networks has also driven prompt action from the industry. Vendors of neural network accelerators like Intel and Nvidia also now include features for model protection. Intel, under its OpenVINO framework, recommends the use of secure enclaves for sensitive model execution and provides features like model encryption. Several security add-ons features are available for vendors to enable the creation, distribution, and application of models in a secure setting. Nvidia, with their latest EGX100 platforms, have introduced the concept of Confidential AI enclaves to prevent IP theft. With the highlighted vulnerability from [3] and follow-up action from both academia as well as industry, the effort to protect sensitive machine learning models has gain momentum. Alongside, we also motivate research in solving these vulnerabilities with a holistic approach under the security by design paradigm. **OUR PREVIOUS WORK** [3] selected for Top Picks in Hardware and Embedded Security 2020 demonstrates that it is possible to reverse engineer neural networks by using side-channel attacks. We developed a framework that considers each part of the neural network separately and then, by combining the information, manages to reverse engineer all relevant hyperparameters and parameters. Our work is a proof of concept (but also a realistic demonstration) that such attacks are possible and warns that more effort should be given to developing countermeasures. While we have used microcontrollers for our experiments, the attack applies to other targets like FPGAs and GPUs. #### Acknowledgments This work was supported in part by the National Research Foundation, Singapore, under its National Cybersecurity Research and Development Programme/Cyber-Hardware Forensic and Assurance Evaluation R&D Programme under Award NRF2018NCR-NCR009-0001. # References - [1] Deloitte Insights. (2020). Bringing AI to the Device: Edge AI Chips Come Into Their Own. [Online]. Available: https://www2.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2020/ai-chips.html - [2] E. Strubell, A. Ganesh, and A. McCallum, "Energy and policy considerations for deep learning in NLP," CoRR, vol. abs/1906.02243, pp. 1–6, Jun. 2019. [Online]. Available: http://arxiv.org/abs/1906.02243 - [3] L. Batina et al., "CSINN: Reverse engineering of neural network architectures through electromagnetic side channel," in *Proc. 28th USENIX Secur. Symp. (USENIX Secur.)*, 2019, pp. 515–532. - [4] I. Goodfellow, Y. Bengio, and A. Courville, *Deep Learning*. Cambridge, MA, USA: MIT Press, 2016. [Online]. Available: http://www.deeplearningbook.org - [5] S. Mangard, E. Oswald, and T. Popp, Power Analysis Attacks: Revealing the Secrets of Smart Cards (Advances in Information Security). Berlin, Germany: Springer-Verlag, 2007. - [6] S. Picek et al., "The curse of class imbalance and conflicting metrics with machine learning for sidechannel evaluations," *IACR Trans. Cryptograph. Hardw. Embedded Syst.*, vol. 2019, no. 1, pp. 1–29, 2019. - [7] A. Dubey, R. Cammarota, and A. Aysu, "MaskedNet: The first hardware inference engine aiming power side-channel protection," in *Proc. IEEE Int. Symp. Hardw. Oriented Secur. Trust (HOST)*, Dec. 2020, pp. 197–208. - [8] S. Moini et al., "Power side-channel attacks on BNN accelerators in remote FPGAs," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 11, no. 2, pp. 357–370, Jun. 2021. - [9] L. Batina et al., "Recovering the input of neural networks via single shot side-channel attacks," in *Proc.* ACM SIGSAC Conf. Comput. Commun. Secur., 2019, pp. 2657–2659. - [10] Ł. Chmielewski and L. Weissbart, "On reverse engineering neural network implementation on GPU," in *Proc. Appl. Cryptogr. Netw. Secur. Workshops* (ACNS), vol. 12809, 2021, pp. 96–113, doi: 10.1007/978-3-030-81645-2\_7. July/August 2022 - [11] J. Wei et al., "Leaky DNN: Stealing deep-learning model secret with GPU context-switching sidechannel," in *Proc. 50th Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw. (DSN)*, Jun. 2020, pp. 125–137. - [12] H. Chabanne, J.-L. Danger, L. Guiga, and U. Kuhne, "Side channel attacks for architecture extraction of neural networks," *CAAI Trans. Intell. Technol.*, vol. 6, no. 1, pp. 3–16, 2021. [Online]. Available: https:// ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/ cit2.12026 **Lejla Batina** is a Professor of embedded systems security at Radboud University, Nijmegen, The Netherlands. Batina has a PhD from KU Leuven, Leuven, Belgium. She is a Senior Member of IEEE. **Shivam Bhasin** is a Senior Research Scientist and the Programme Manager of Cryptographic Engineering at Nanyang Technological University, Singapore. His research interests include embedded security and trusted computing. Bhasin has a master's degree from Mines Saint-Etienne, France, and a PhD from Telecom Paristech, France. **Dirmanto Jap** is a Research Scientist at PACE Laboratory, Temasek Laboratories, Nanyang Technological University (NTU), Singapore. His main research topics include physical attacks and countermeasures, practical fault injection, and application of machine learning for security. Jap has a PhD in mathematics from NTU. **Stjepan Picek** is an Assistant Professor at the Delft University of Technology, Delft, The Netherlands. His research interests include security, machine learning, and evolutionary algorithms. Picek has a PhD. ■ Direct questions and comments about this article to Dirmanto Jap, Nanyang Technological University, Singapore 637553; diap@ntu.edu.sg. 14 IEEE Design&Test