Multimodal Deep Learning for the Classification of Human Activity

None, None

Multimodal Deep Learning for the Classification of Human Activity

Radar and Video data fusion for the classification of human activity

Master Thesis (2019)

Author(s)

R.J. de Jong (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Faruk Uysal – Mentor

Olexander Yarovoy – Mentor

Jacco de Wit – Mentor

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Deep Learning Multimodal Classification Micro-Doppler Human Activity

To reference this document use:

https://resolver.tudelft.nl/uuid:9e62cbc0-d110-47b9-a0a3-312d6b0eebc8

More Info

expand_more

Publication Year

2019

Language

English

Copyright

Graduation Date

14-01-2019

Awarding Institution

Delft University of Technology

Abstract

Persistent surveillance is an urgent proficiency. For security, surveillance cameras are a strong asset as they support the automatic tracking of people and are directly interpretable by a human operator. Radar on the other hand can be used under a broad range of circumstances: radar can penetrate mediums such as clouds, fogs, mist and snow, and it can be used when it gets dark.
However radar data, compared to an optical sensor as video, is not as easily interpretable by a human operator. This thesis explores the potential of multimodal deep learning with a radar and video sensor to improve the classification accuracy of human activity. A recorded and labelled dataset is created that contains three different human activities: walking, walking with a metal pole and walking with a backpack (10 kg). A Single Shot Detector is used to process the video data. The cropped frames are then associated with the start of a radar micro-Doppler signature with a duration of 1.28 seconds. The dataset is split in a training (80 %) and validation (20 %) set such that no data from a person in the training set is in the validation set. Implementations of convolutional neural networks for the video frames and micro-Doppler signatures obtain classification accuracies of 85.78 % and 63.12 % respectively for previously mentioned activities. It was not possible to distinguish a person walking and walking carrying a backpack on basis of the micro-Doppler signatures. The synchronised dataset is used to investigate different fusion methods. Both early and late fusion methods show an improvement in classification accuracy. The best obtained early fusion model achieves a classification accuracy of 90.60 %. Omitting the radar data however shows a drop in classification accuracy of just 0.9 %, identifying the video data as the dominant modality in this particular setup.

Files

Multimodal_Deep_Learning_for_t... (pdf)

(pdf | 5.86 Mb)

- Embargo expired in 30-11-2019

License info not available