Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

Conference Paper (2023)
Author(s)

J. Warchocki (Student TU Delft)

T. Oprescu (Student TU Delft)

Y. Wang (Student TU Delft)

A. Dămăcuș (Student TU Delft)

P.M. Misterka (Student TU Delft)

R. Bruintjes (TU Delft - Pattern Recognition and Bioinformatics)

A. Lengyel (TU Delft - Pattern Recognition and Bioinformatics)

Ombretta Strafforello (TU Delft - Pattern Recognition and Bioinformatics)

Jan Gemert (TU Delft - Pattern Recognition and Bioinformatics)

Research Group
Pattern Recognition and Bioinformatics
Copyright
© 2023 J. Warchocki, T. Oprescu, Y. Wang, A. Dămăcuș, P.M. Misterka, R. Bruintjes, A. Lengyel, O. Strafforello, J.C. van Gemert
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 J. Warchocki, T. Oprescu, Y. Wang, A. Dămăcuș, P.M. Misterka, R. Bruintjes, A. Lengyel, O. Strafforello, J.C. van Gemert
Research Group
Pattern Recognition and Bioinformatics
Pages (from-to)
3008-3016
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In temporal action localization, given an input video, the goal is to predict which actions it contains, where they begin, and where they end. Training and testing current state-of- the-art deep learning models requires access to large amounts of data and computational power. However, gathering such data is challenging and computational resources might be limited. This work explores and measures how current deep temporal action localization models perform in settings constrained by the amount of data or computational power. We measure data efficiency by training each model on a subset of the training set. We find that TemporalMaxer outperforms other models in data-limited settings. Furthermore, we recommend TriDet when training time is limited. To test the efficiency of the models during inference, we pass videos of different lengths through each model. We find that TemporalMaxer requires the least computational resources, likely due to its simple architecture.

Files

License info not available