Time-Efficient Video Annotation with t-SNE

More Info
expand_more

Abstract

Video understanding has received more attention in the past few years due to the availability of several large-scale video datasets and improvement in the computational power of computers. However, annotating large-scale video datasets are cost-intensive due to their complexity. In this work, we propose a time-efficient video annotation method using spatio-temporal feature similarity and t-SNE dimensionality reduction to make the annotation process more efficient. Placing the same actions from different videos near each other in the two-dimensional space based on feature similarity helps the oracle to group label the video clips. We evaluate the performance of our method on two subsets of the ActivityNet (v1.3) dataset. We show that our method can outperform conventional video labeling tools time-wise while maintaining a reasonable test accuracy on video classification task compared to the ground-truth labels. To further evaluate the generalization of our method, we test our performance on Sports-1M and Breakfast datasets.