Evaluation of Video Summarization Using Fully Convolutional Sequence Networks on Action Localization Datasets

More Info
expand_more

Abstract

In the problem of video summarization, the goal is to select a subset of the input frames conveying the most important information of the input video. The collection of data proves to be a challenging task. In part because there exists a disagreement among human annotators on what segments of a video should be considered important for a summary. In this study we analyse a new dataset created with the goal of increasing agreement between the human annotators. The dataset has been created with the use of a novel annotation method, which uses existing action localization labels for segmenting the videos. We train a supervised and an unsupervised deep learning framework on popularly used benchmark datasets and the new dataset. Experimental results show the effectiveness of this novel summary annotation method in improving
the agreement between annotators. Analysis reveals some issues with the evaluation of the deep learning framework.