Time-Efficient Video Annotation with t-SNE
Soroosh Poorgholi (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J.C. van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
O.S. Kayhan – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
M. Loog – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
T. Höllt – Graduation committee member (TU Delft - Computer Graphics and Visualisation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Video understanding has received more attention in the past few years due to the availability of several large-scale video datasets and improvement in the computational power of computers. However, annotating large-scale video datasets are cost-intensive due to their complexity. In this work, we propose a time-efficient video annotation method using spatio-temporal feature similarity and t-SNE dimensionality reduction to make the annotation process more efficient. Placing the same actions from different videos near each other in the two-dimensional space based on feature similarity helps the oracle to group label the video clips. We evaluate the performance of our method on two subsets of the ActivityNet (v1.3) dataset. We show that our method can outperform conventional video labeling tools time-wise while maintaining a reasonable test accuracy on video classification task compared to the ground-truth labels. To further evaluate the generalization of our method, we test our performance on Sports-1M and Breakfast datasets.