Distributed Radar-based Human Activity Recognition using Vision Transformer and CNNs
Yubin Zhao (Student TU Delft)
Ronny Gerhard Guendel (TU Delft - Microwave Sensing, Signals & Systems)
A. G. Yarovyi (TU Delft - Microwave Sensing, Signals & Systems)
Francesco Fioranelli (TU Delft - Microwave Sensing, Signals & Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The feasibility of classifying human activities measured by a distributed ultra-wideband (UWB) radar system using Range-Doppler (RD) images as the input to classifiers is investigated. Kinematic characteristics of different human activities are expected to be captured in high-resolution range-Doppler images measured by UWB radars. To construct the dataset, 5 distributed monostatic Humatics P410 radars are used to record 15 participants performing 9 activities in arbitrary directions along a designated trajectory. For the first time a convolution-free neural network based on the novel multi-head attention mechanism (the Vision Transformer architecture) is adopted as the classifier, attaining an accuracy of 76.5 %. A comparison between Vision Transformer and more conventional CNN-based architectures, such as ResNet and AlexNet, is also conducted. The robustness of Vision Transformer and the other networks against unseen participants is also validated by testing via Leave One Participant Out validation.