Benchmarking Data and Computational Efficiency of ActionFormer on Temporal Action Localization Tasks

None, None

Benchmarking Data and Computational Efficiency of ActionFormer on Temporal Action Localization Tasks

Analysing the Performance and Generalizability of ActionFormer in Resource-constrained Environments

Bachelor Thesis (2023)

Author(s)

J. Warchocki (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

R. Bruintjes – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Lengyel – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

O. Strafforello – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

P. Kellnhofer – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Transformers Temporal action localization Action recognition Data efficiency Computational efficiency

To reference this document use

https://resolver.tudelft.nl/uuid:06ebfc25-57ed-4071-a23a-3b69c3ca2126

More Info

expand_more

Publication Year

2023

Language

English

Graduation Date

29-06-2023

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

622

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In temporal action localization, given an input video, the goal is to predict which actions it contains, where they begin and where they end. Training and testing current state-of-the-art, deep learning models is done assuming access to large amounts of data and computational power. Gathering such data is however a challenging task and access to computational resources might be limited. This work thus explores and measures how well one of such deep learning models, ActionFormer, performs in settings constrained by the amount of data or computational power. Data efficiency was measured by training the model on a subset of the training set and testing on the test set. Although ActionFormer showed promising results on both THUMOS'14 and ActivityNet datasets, TriDet and TemporalMaxer models should likely be chosen in favor of ActionFormer in limited data settings as they exhibit better data efficiency. Similarly, the TriDet model should be chosen in favor of ActionFormer in cases where the training time is limited, as it showed better computational efficiency during training. To test the efficiency of the model during inference, videos of different lengths were passed through the model. Most importantly, we find that both the inference time and the memory usage of the model scale linearly with input video length, as predicted by the authors of the ActionFormer.

Files

Jan_Warchocki_Bachelor_Thesis.... (pdf)

(pdf | 0.414 Mb)

License info not available