Deep Learning-Empowered Content-Based Video Image Retrieval

More Info
expand_more

Abstract

The advent of streaming and video has sparked a revolutionary shift in the presentation of materials across various fields, such as history, art, and media copyright protection. In this context, scholars and rights holders are seeking efficient solutions to index, retrieve, and browse through digital content searching for a specific instance. Unlike searching a specific instance in an image, searching in a video requires more than analyzing the visual features of an image and then comparing these features to a database, for it includes processing video sequences and retrieving video segments.

Motivated by the urgent need and promising applications across diverse disciplines, we present a novel deep-learning-empowered content-based video image retrieval (CBVIR) system with a strong emphasis on real-world applications. This system offers high efficiency and considerable accuracy, addressing the challenges associated with accessing and utilizing video materials effectively. Our initial approach revolves around the extraction of informative keyframes that effectively capture essential objects within the video. This process, known as Key Frame Extraction (KFE), enables us to distill the most crucial visual representations for further analysis. After the extraction of keyframes, the relatively smaller dataset allows for content-based image retrieval (CBIR) to be conducted, retrieving similar images from a database solely based on the content of the query image. In this project, a wide range of methods are investigated and analyzed, including traditional representations, handcrafted feature extraction methods, and up-todate machine learning-based image representations. Our contribution is striking a balance between high-level and low-level image representations for this task. Targeting efficiency improvement, enhanced color-based features together with dynamic clustering KFE module is proposed and implemented, achieving high efficiency ratio and satisfactory accuracy. While targeting accuracy, a traditional and deep learning-based hybrid feature is proposed, achieving valid efficiency ratio and highest accuracy. Overall, an automatic retrieving system requiring much less user engagement is provided, together with a system GUI prototype。