Search by Image: Deep Learning Based Image Visual Feature Extraction

More Info
expand_more

Abstract

In recent years, the expansion of the Internet has brought an explosion of visual information, including social media, medical photographs, and digital history. This massive amount of visual content generation and sharing presents new challenges, especially when searching for similar information in databases —— Content-Based Image Retrieval (CBIR). Feature extraction is the foundation of image retrieval, making research into obtaining concrete features and representations of image content a vital concern.
In the feature extraction module, We first pre-process the target image and input it into a CNN to obtain feature maps for different channels. These feature maps can be aggregated into compact and global uniform descriptors by pooling. Then these global descriptors are further dimensionalised and normalized by whitening methods to obtain image feature vectors that are easy to compute and compare. In this process, the accuracy of the retrieval depends on how accurately the final feature vectors represent the meaning expressed by the target image. Therefore, various CNN network structures, pooling and whitening methods are proposed to get more concrete feature vectors.
In this thesis, our study (1) fine tunes the pre-trained CNNs, (2) optimizes the application of second-order attention information in feature map, (3) applies and compares popular feature enhancement methods in both aggregating and whitening, (4) explores how to combine all strengths, and (5) propose a new model \textit{ResNet-SOI}, which achieves 53.4(M) and 59.2(M) mAP on the challenging benchmark \textit{ROxford5k+1M} and\textit{ RParis6k+1M}, and outperforms the state-of-art methods. Our prototype GUI is available on GitHub (https://github.com/yanan-huu/Image-Search-Engine-for-Historical-Research).