Loss functions for profiled side-channel analysis

An analysis of loss functions and the application of multi­-loss functions for deep learning in the SCA domain

More Info
expand_more

Abstract

Deep learning techniques have become the tool of choice for side-channel analysis. In recent years, neural networks like multi-layer perceptrons and convolutional neural networks have proven to be the most powerful instruments for performing side-channel analysis. Recent work on this topic has focused on different aspects of these techniques, either to improve the performance of the resulting models, reduce the complexity or find new well-performing architectures. A part of these neural networks that has received relatively little attention is the loss function. Loss functions play a key role in how these networks learn since each of the weights in the network is updated to minimise the loss calculated by the loss function. Work on loss functions in other fields where deep learning is used shows that the choice of loss function impacts the performance of the resulting models. While there are two novel functions proposed specifically for side-channel analysis, no broad analysis of the performance of different functions has been done in this context.
In this work, we provide such a broad comparison between different loss functions in the context of side-channel analysis and how they impact the performance of the resulting models. We show that novel, application-specific loss functions almost always outperform the current standard categorical cross-entropy. Besides that, we also show that state-of-the-art (multi-)loss functions from other domains can be successfully applied to side-channel analysis. Finally, we provide an overview of the strengths and weaknesses of the different loss functions in various side-channel analysis scenarios and use those to introduce our own novel loss function, the focal loss ratio. We show that this new loss function based on characteristics of other, well-performing loss functions, outperforms the previous best function in most SCA scenarios.