Image-Based Video Search Engine

Feature Extraction

Bachelor Thesis (2022)
Author(s)

M.J.F. van Oort (TU Delft - Electrical Engineering, Mathematics and Computer Science)

K.A. Hoogeveen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.H.G. Dauwels – Mentor (TU Delft - Signal Processing Systems)

L Pakula – Graduation committee member (TU Delft - Electronic Instrumentation)

Nuria Llombart Juan – Graduation committee member (TU Delft - Tera-Hertz Sensing)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Max van Oort, Aron Hoogeveen
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Max van Oort, Aron Hoogeveen
Graduation Date
20-06-2022
Awarding Institution
Delft University of Technology
Project
['Bachelor Graduation Project']
Programme
['Electrical Engineering']
Related content

Link to the codebase (GitHub).

https://github.com/aron-hoogeveen/ibvse
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

One of the main problems with Instance-Level Image Retrieval in video data is that for query videos with multiple objects of the same instance, extracting features from keyframes of this query video is time consuming. This thesis aims to solve this problem by implementing a Convolutional Neural Network based approach, which significantly reduces the extraction time and increases the accuracy. After analysing multiple methods, Second-Order Loss and Attention for image Retrieval (SOLAR) was found to be the most promising method. SOLAR will be tested based on three performance metrics: the mean average precision, the recall and the extraction time per image. The performance will be evaluated based on a selection of videos and images from a dataset provided by Dr. Andrea Nanetti from the Engineering Historical Memory project. For this dataset SOLAR achieved a mean average precision of 83 %, a recall of 85 % and an extraction time of 0.73 seconds per image. To conclude, SOLAR is specialised in detecting and describing landmarks due to its pre-trained model, but for a more general case the backbone model should be trained differently which will increase the accuracy. Future work could also include speed improvement by looking at object detection methods.

Files

License info not available