Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

Conference Paper (2023)
Author(s)

J. Warchocki (Student TU Delft)

T. Oprescu (Student TU Delft)

Y. Wang (Student TU Delft)

A. Dămăcuș (Student TU Delft)

P.M. Misterka (Student TU Delft)

R. Bruintjes (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Lengyel (TU Delft - Electrical Engineering, Mathematics and Computer Science)

O. Strafforello (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.C. van Gemert (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Pattern Recognition and Bioinformatics
More Info
expand_more
Publication Year
2023
Language
English
Research Group
Pattern Recognition and Bioinformatics
Pages (from-to)
3008-3016
Event
ICCV 2023: International Conference on Computer Vision (2023-10-02 - 2023-10-06), Paris, France
Downloads counter
341
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In temporal action localization, given an input video, the goal is to predict which actions it contains, where they begin, and where they end. Training and testing current state-of- the-art deep learning models requires access to large amounts of data and computational power. However, gathering such data is challenging and computational resources might be limited. This work explores and measures how current deep temporal action localization models perform in settings constrained by the amount of data or computational power. We measure data efficiency by training each model on a subset of the training set. We find that TemporalMaxer outperforms other models in data-limited settings. Furthermore, we recommend TriDet when training time is limited. To test the efficiency of the models during inference, we pass videos of different lengths through each model. We find that TemporalMaxer requires the least computational resources, likely due to its simple architecture.

Files

License info not available