Recent advances in video content analysis

From visual features to semantic video segments

More Info
expand_more

Abstract

This paper addresses the problem of automatically partitioning a video into semantic segments using visual low-level features only. Semantic segments may be understood as building content blocks of a video with a clear sequential content structure. Examples are reports in a news program, episodes in a movie, scenes of a situation comedy or topic segments of a documentary. In some video genres like news programs or documentaries, the usage of diffierent media (visual, audio, speech, text) may be beneficial or is even unavoidable for reliably detecting the boundaries between semantic segments. In many other genres, however, the pay-off in using diffierent media for the purpose of high-level segmentation is not high. On the one hand, relating the audio, speech or text to the semantic temporal structure of video content is generally very dificult. This is especially so in \acting" video genres like movies and situation comedies. On the other hand, the information contained in the visual stream of these video genres often seems to provide the major clue about the position of semantic segments boundaries. Partitioning a video into semantic segments can be performed by measuring the coherence of the content along neighboring video shots of a sequence. The segment boundaries are then found at places (e.g., shot boundaries) where the values of content coherence are sufficiently low. On the basis of two state-of-the-art techniques for content coherence modeling, we illustrate in this paper the current possibilities for detecting the boundaries of semantic segments using visual low-level features only. © 2018 World Scientific Publishing Company