PUNet: Temporal Action Proposal Generation with Positive Unlabeled Learning using Key Frame Annotations
N.U.S. Zia (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J.C. van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Osman Semih Kayhan – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
A good action proposal method should generate proposals with high recall and high temporal overlap with groundtruth. The quality of the proposals relies on the labeled data available during training. Obtaining labeled data for untrimmed videos is a time consuming, expensive and error-prone task. The labels obtained are also subjective and the temporal bounds are inconsistent between different human annotators. We propose using a single key frame label for each action instance instead of the start and end point labels to generate temporal proposals. This reduces the number of labeled action frames in the dataset leading to class imbalance. To overcome this, we replace the learning setting with a PU-learning setup. We demonstrate that using key frames as labels give high quality proposals and yield results comparable to using full annotations while being faster to annotate as the exact temporal bounds no longer need to be annotated. We evaluate our method on THUMOS'14 and ActivityNet v1.2 dataset. Further experiments indicate that by combining existing action classifier on our proposals, our method is able to achieve high mean average precision (mAP) for action localization.