Group Equivariant Video Action Recognition

Making action-recognition networks equivariant to temporal direction and discrete spatial rotations

Master thesis (2021)

Authors

D. Basu Electrical Engineering, Mathematics and Computer Science

Contributors

J.C. van Gemert Pattern Recognition and Bioinformatics - (mentor)

C. Lofi Web Information Systems - (graduation committee member)

O. Strafforello Pattern Recognition and Bioinformatics - (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Deep Learning Computer Vision Action Recognition Group equivariance

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:d69313b9-adb0-4440-9b27-24bb2e30bf96

Published Date

22-12-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

This work applies the theory of group equivariance to the domain of video action recognition replacing standard 3Dconvolutions with group convolutions which are equivariant to temporal direction, and multiples of 90-degree spatial rotations. We propose a temporal direction symmetry group T2 and extend the standard planar rotations group to three dimensions to form a 3D group that is equivariant to discrete 90-degree spatial rotations. We analyse the efficacy of using these 3D-G-CNNs as drop-in replacements in 3D networks by evaluating synthesized datasets containing handwritten MNIST digits moving over a black background, as well as popular action recognition datasets UCF-101and HMDB-51, and comparing the results against the performance of the standard 3D CNNs on the datasets.

Files

Group_Equivariant_Video_Action... (.pdf)

(.pdf | 7.55 Mb)