Print Email Facebook Twitter Benchmarking Data and Computational Efficiency of ActionFormer on Temporal Action Localization Tasks Title Benchmarking Data and Computational Efficiency of ActionFormer on Temporal Action Localization Tasks: Analysing the Performance and Generalizability of ActionFormer in Resource-constrained Environments Author Warchocki, Jan (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor van Gemert, J.C. (mentor) Bruintjes, R. (mentor) Lengyel, A. (mentor) Strafforello, O. (mentor) Kellnhofer, P. (graduation committee) Degree granting institution Delft University of Technology Corporate name Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2023-06-29 Abstract In temporal action localization, given an input video, the goal is to predict which actions it contains, where they begin and where they end. Training and testing current state-of-the-art, deep learning models is done assuming access to large amounts of data and computational power. Gathering such data is however a challenging task and access to computational resources might be limited. This work thus explores and measures how well one of such deep learning models, ActionFormer, performs in settings constrained by the amount of data or computational power. Data efficiency was measured by training the model on a subset of the training set and testing on the test set. Although ActionFormer showed promising results on both THUMOS'14 and ActivityNet datasets, TriDet and TemporalMaxer models should likely be chosen in favor of ActionFormer in limited data settings as they exhibit better data efficiency. Similarly, the TriDet model should be chosen in favor of ActionFormer in cases where the training time is limited, as it showed better computational efficiency during training. To test the efficiency of the model during inference, videos of different lengths were passed through the model. Most importantly, we find that both the inference time and the memory usage of the model scale linearly with input video length, as predicted by the authors of the ActionFormer. Subject temporal action localizationaction recognitionTransformersdata efficiencycomputational efficiency To reference this document use: http://resolver.tudelft.nl/uuid:06ebfc25-57ed-4071-a23a-3b69c3ca2126 Part of collection Student theses Document type bachelor thesis Rights © 2023 Jan Warchocki Files PDF Jan_Warchocki_Bachelor_Thesis.pdf 424.42 KB Close viewer /islandora/object/uuid:06ebfc25-57ed-4071-a23a-3b69c3ca2126/datastream/OBJ/view