Group Equivariant Video Action Recognition

None, None

Group Equivariant Video Action Recognition

Making action-recognition networks equivariant to temporal direction and discrete spatial rotations

Master Thesis (2021)

Author(s)

D. Basu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

C. Lofi – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

O. Strafforello – Coach (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Deep Learning Computer Vision Action Recognition Group equivariance

To reference this document use

https://resolver.tudelft.nl/uuid:d69313b9-adb0-4440-9b27-24bb2e30bf96

More Info

expand_more

Publication Year

2021

Language

English

Graduation Date

22-12-2021

Awarding Institution

Delft University of Technology

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

241

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This work applies the theory of group equivariance to the domain of video action recognition replacing standard 3Dconvolutions with group convolutions which are equivariant to temporal direction, and multiples of 90-degree spatial rotations. We propose a temporal direction symmetry group T2 and extend the standard planar rotations group to three dimensions to form a 3D group that is equivariant to discrete 90-degree spatial rotations. We analyse the efficacy of using these 3D-G-CNNs as drop-in replacements in 3D networks by evaluating synthesized datasets containing handwritten MNIST digits moving over a black background, as well as popular action recognition datasets UCF-101and HMDB-51, and comparing the results against the performance of the standard 3D CNNs on the datasets.

Files

Group_Equivariant_Video_Action... (pdf)

(pdf | 7.55 Mb)

License info not available