Benchmarking Data and Computational Efficiency of ActionFormer on Temporal Action Localization Tasks

Analysing the Performance and Generalizability of ActionFormer in Resource-constrained Environments

Bachelor Thesis (2023)
Author(s)

J. Warchocki (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

R. Bruintjes – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A. Lengyel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Ombretta Strafforello – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Petr Kellnhofer – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Jan Warchocki
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Jan Warchocki
Graduation Date
29-06-2023
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In temporal action localization, given an input video, the goal is to predict which actions it contains, where they begin and where they end. Training and testing current state-of-the-art, deep learning models is done assuming access to large amounts of data and computational power. Gathering such data is however a challenging task and access to computational resources might be limited. This work thus explores and measures how well one of such deep learning models, ActionFormer, performs in settings constrained by the amount of data or computational power. Data efficiency was measured by training the model on a subset of the training set and testing on the test set. Although ActionFormer showed promising results on both THUMOS'14 and ActivityNet datasets, TriDet and TemporalMaxer models should likely be chosen in favor of ActionFormer in limited data settings as they exhibit better data efficiency. Similarly, the TriDet model should be chosen in favor of ActionFormer in cases where the training time is limited, as it showed better computational efficiency during training. To test the efficiency of the model during inference, videos of different lengths were passed through the model. Most importantly, we find that both the inference time and the memory usage of the model scale linearly with input video length, as predicted by the authors of the ActionFormer.

Files

License info not available