Image-Based Video Search Engine
Feature Extraction
M.J.F. van Oort (TU Delft - Electrical Engineering, Mathematics and Computer Science)
K.A. Hoogeveen (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J.H.G. Dauwels – Mentor (TU Delft - Signal Processing Systems)
L Pakula – Graduation committee member (TU Delft - Electronic Instrumentation)
Nuria Llombart Juan – Graduation committee member (TU Delft - Tera-Hertz Sensing)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
One of the main problems with Instance-Level Image Retrieval in video data is that for query videos with multiple objects of the same instance, extracting features from keyframes of this query video is time consuming. This thesis aims to solve this problem by implementing a Convolutional Neural Network based approach, which significantly reduces the extraction time and increases the accuracy. After analysing multiple methods, Second-Order Loss and Attention for image Retrieval (SOLAR) was found to be the most promising method. SOLAR will be tested based on three performance metrics: the mean average precision, the recall and the extraction time per image. The performance will be evaluated based on a selection of videos and images from a dataset provided by Dr. Andrea Nanetti from the Engineering Historical Memory project. For this dataset SOLAR achieved a mean average precision of 83 %, a recall of 85 % and an extraction time of 0.73 seconds per image. To conclude, SOLAR is specialised in detecting and describing landmarks due to its pre-trained model, but for a more general case the backbone model should be trained differently which will increase the accuracy. Future work could also include speed improvement by looking at object detection methods.