PUNet: Temporal Action Proposal Generation with Positive Unlabeled Learning using Key Frame Annotations

None, None

PUNet: Temporal Action Proposal Generation with Positive Unlabeled Learning using Key Frame Annotations

Master Thesis (2020)

Author(s)

N.U.S. Zia (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Osman Semih Kayhan – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Deep Learning Computer vision Action localization

To reference this document use:

https://resolver.tudelft.nl/uuid:505123cb-125b-4877-a159-94f8d49c58e6

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Graduation Date

31-08-2020

Awarding Institution

Delft University of Technology

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A good action proposal method should generate proposals with high recall and high temporal overlap with groundtruth. The quality of the proposals relies on the labeled data available during training. Obtaining labeled data for untrimmed videos is a time consuming, expensive and error-prone task. The labels obtained are also subjective and the temporal bounds are inconsistent between different human annotators. We propose using a single key frame label for each action instance instead of the start and end point labels to generate temporal proposals. This reduces the number of labeled action frames in the dataset leading to class imbalance. To overcome this, we replace the learning setting with a PU-learning setup. We demonstrate that using key frames as labels give high quality proposals and yield results comparable to using full annotations while being faster to annotate as the exact temporal bounds no longer need to be annotated. We evaluate our method on THUMOS'14 and ActivityNet v1.2 dataset. Further experiments indicate that by combining existing action classifier on our proposals, our method is able to achieve high mean average precision (mAP) for action localization.

Files

Thesis_Report_NUSZia.pdf

(pdf | 2.14 Mb)

License info not available