Segmenting actions by aligning video frames to learned prototypes

None, None

Segmenting actions by aligning video frames to learned prototypes

Master Thesis (2023)

Author(s)

D. Hoonhout (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Silvia Pintea – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Jan C. Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Unsupervised Action localization Dynamic Time Warping

To reference this document use:

https://resolver.tudelft.nl/uuid:e4d52106-4247-47d0-808f-6492f8f3d157

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

19-07-2023

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Video temporal action localization is the task of identifying and localizing specific actions or activities within a video stream. Instead of only classifying which actions occur in the video stream, we aim to detect when an action begins and ends. In this work, we focus on solving this task without any supervision. Existing unsupervised methods solve this task by exploiting a combination of spatial and temporal information. We propose a new model that uses a MLP (multilayer perceptron to learn to sample prototype frames from a video. We use the distance between prototypes and video frames given by DTW (dynamic time
warping) as a loss function to update the MLP. The sampled prototypes allow us to find the start and end boundaries of actions, when combined with DTW. Additionally, the prototype frames can be used for video summarization. We analyze our model in a controlled synthetic data setup, to show the weaknesses and strengths of our models. Additionally, we use the Breakfast dataset, and Cholec80 surgery dataset to compare our model to the state-of-the-art models in a real scenario.

Files

MSc_Thesis_Douwe_Hoonhout.pdf

(pdf | 7.25 Mb)

License info not available