Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None

Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

Conference Paper (2023)

Author(s)

J. Warchocki (Student TU Delft)

T. Oprescu (Student TU Delft)

Y. Wang (Student TU Delft)

A. Dămăcuș (Student TU Delft)

P.M. Misterka (Student TU Delft)

R. Bruintjes (TU Delft - Pattern Recognition and Bioinformatics)

A. Lengyel (TU Delft - Pattern Recognition and Bioinformatics)

O. Strafforello (TU Delft - Pattern Recognition and Bioinformatics)

Jan Van Gemert (TU Delft - Pattern Recognition and Bioinformatics)

Research Group

Pattern Recognition and Bioinformatics

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:85745663-fc53-43f8-8cec-8fc1b4d2bd34

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Research Group

Pattern Recognition and Bioinformatics

Pages (from-to)

3008-3016

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In temporal action localization, given an input video, the goal is to predict which actions it contains, where they begin, and where they end. Training and testing current state-of- the-art deep learning models requires access to large amounts of data and computational power. However, gathering such data is challenging and computational resources might be limited. This work explores and measures how current deep temporal action localization models perform in settings constrained by the amount of data or computational power. We measure data efficiency by training each model on a subset of the training set. We find that TemporalMaxer outperforms other models in data-limited settings. Furthermore, we recommend TriDet when training time is limited. To test the efficiency of the models during inference, we pass videos of different lengths through each model. We find that TemporalMaxer requires the least computational resources, likely due to its simple architecture.

Files

Warchocki_Benchmarking_Data_Ef... (pdf)

(pdf | 0.569 Mb)

License info not available