Image-based Video Search Engine: Keyframe Extraction

More Info
expand_more

Abstract

In this report, the analysis and design of a system that extracts keyframes from videos is detailed. The need for such a sub-module stems from the similarity of frames in a video. To aid in reducing the computation time of the content based video search engine, the Keyframe Extraction Module reduces the amount of frames by discarding frames that are similar in information. Determining what frames can be considered similar is one of the main challenges, as there are many ways of assigning values to how much frames differ. In the past decades, many research has been done on keyframe extraction and video summarization and many methods are proposed to form keyframe selections, varying in what is considered salient information and varying in computation time. The most challenging part of the design is that there is a time constraint present, which called for a proper analysis in what methods are suitable. After all, this limitation when creating video summaries is often not a large topic in research papers.This report will cover Shot Detection techniques along with various Keyframe Extraction methods that can be categorized in clustering, visual content, fixed selection and uniform sampling. Furthermore, evaluation methods like the Fidelity measure for the performance of particular methods are also addressed, as determining how well a keyframe selection is is not trivial. It is concluded that out of the techniques analyzed, a combination of VSUMM clustering and histogram matching along with histogram-based shot based detection with a CFAR threshold and pre-sampling is most suitable for the general case under a time constraint. Future work could include looking at hierarchical clustering methods and optimizing the Shot Boundary Detection module following the most recent papers.