Image-Based Video Search Engine
Feature Extraction
More Info
expand_more
Abstract
One of the main problems with Instance-Level Image Retrieval in video data is that for query videos with multiple objects of the same instance, extracting features from keyframes of this query video is time consuming. This thesis aims to solve this problem by implementing a Convolutional Neural Network based approach, which significantly reduces the extraction time and increases the accuracy. After analysing multiple methods, Second-Order Loss and Attention for image Retrieval (SOLAR) was found to be the most promising method. SOLAR will be tested based on three performance metrics: the mean average precision, the recall and the extraction time per image. The performance will be evaluated based on a selection of videos and images from a dataset provided by Dr. Andrea Nanetti from the Engineering Historical Memory project. For this dataset SOLAR achieved a mean average precision of 83 %, a recall of 85 % and an extraction time of 0.73 seconds per image. To conclude, SOLAR is specialised in detecting and describing landmarks due to its pre-trained model, but for a more general case the backbone model should be trained differently which will increase the accuracy. Future work could also include speed improvement by looking at object detection methods.