Search by Image: Deep Learning Based Image Visual Feature Extraction

Master thesis (2022)

Authors

Y. HU Electrical Engineering, Mathematics and Computer Science

Contributors

J.H.G. Dauwels Signal Processing Systems - (mentor)

N. Tömen Pattern Recognition and Bioinformatics - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:e0d2bc46-3caa-43ae-b83f-37b830757eac

Published Date

22-08-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In recent years, the expansion of the Internet has brought an explosion of visual information, including social media, medical photographs, and digital history. This massive amount of visual content generation and sharing presents new challenges, especially when searching for similar information in databases —— Content-Based Image Retrieval (CBIR). Feature extraction is the foundation of image retrieval, making research into obtaining concrete features and representations of image content a vital concern.
In the feature extraction module, We first pre-process the target image and input it into a CNN to obtain feature maps for different channels. These feature maps can be aggregated into compact and global uniform descriptors by pooling. Then these global descriptors are further dimensionalised and normalized by whitening methods to obtain image feature vectors that are easy to compute and compare. In this process, the accuracy of the retrieval depends on how accurately the final feature vectors represent the meaning expressed by the target image. Therefore, various CNN network structures, pooling and whitening methods are proposed to get more concrete feature vectors.
In this thesis, our study (1) fine tunes the pre-trained CNNs, (2) optimizes the application of second-order attention information in feature map, (3) applies and compares popular feature enhancement methods in both aggregating and whitening, (4) explores how to combine all strengths, and (5) propose a new model \textit{ResNet-SOI}, which achieves 53.4(M) and 59.2(M) mAP on the challenging benchmark \textit{ROxford5k+1M} and\textit{ RParis6k+1M}, and outperforms the state-of-art methods. Our prototype GUI is available on GitHub (https://github.com/yanan-huu/Image-Search-Engine-for-Historical-Research).

Files

MSc_thesis_YananHu.pdf

(.pdf | 7.49 Mb)