Group Equivariant Video Action Recognition

Making action-recognition networks equivariant to temporal direction and discrete spatial rotations

Master Thesis (2021)
Author(s)

D. Basu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

C. Lofi – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

O. Strafforello – Coach (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2021
Language
English
Graduation Date
22-12-2021
Awarding Institution
Delft University of Technology
Programme
Computer Science
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
241
Collections
thesis
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This work applies the theory of group equivariance to the domain of video action recognition replacing standard 3Dconvolutions with group convolutions which are equivariant to temporal direction, and multiples of 90-degree spatial rotations. We propose a temporal direction symmetry group T2 and extend the standard planar rotations group to three dimensions to form a 3D group that is equivariant to discrete 90-degree spatial rotations. We analyse the efficacy of using these 3D-G-CNNs as drop-in replacements in 3D networks by evaluating synthesized datasets containing handwritten MNIST digits moving over a black background, as well as popular action recognition datasets UCF-101and HMDB-51, and comparing the results against the performance of the standard 3D CNNs on the datasets.

Files

License info not available