Photo2Video

None, None; None, None; None, None; None, None; None, None; None, None; None, None

Photo2Video

Semantic-Aware Deep Learning-Based Video Generation from Still Content

Journal Article (2022)

Author(s)

Paula Viana (Polytechnic Institute of Porto, Institute for Systems and Computer Engineering, Technology and Science (INESC TEC))

Maria Teresa Andrade (Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Universidade do Porto)

Pedro Carvalho (Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Polytechnic Institute of Porto)

Luis Vilaça (Institute for Systems and Computer Engineering, Technology and Science (INESC TEC))

Inês N. Teixeira (Institute for Systems and Computer Engineering, Technology and Science (INESC TEC))

Tiago Costa (Institute for Systems and Computer Engineering, Technology and Science (INESC TEC))

Pieter Jonker (TU Delft - Mechanical Engineering, QdepQ Systems B.V.)

Research Group

Biomechatronics & Human-Machine Control

Deep learning Storytelling Context awareness Automated content creation RoI Semantic awareness

DOI related publication

https://doi.org/10.3390/jimaging8030068 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:d440d3fc-a010-4550-8ffe-65c136162079

More Info

expand_more

Publication Year

2022

Language

English

Research Group

Biomechatronics & Human-Machine Control

Journal title

Journal of Imaging

Issue number

3

Volume number

8

Article number

68

Downloads counter

388

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Applying machine learning (ML), and especially deep learning, to understand visual content is becoming common practice in many application areas. However, little attention has been given to its use within the multimedia creative domain. It is true that ML is already popular for content creation, but the progress achieved so far addresses essentially textual content or the identification and selection of specific types of content. A wealth of possibilities are yet to be explored by bringing the use of ML into the multimedia creative process, allowing the knowledge inferred by the former to influence automatically how new multimedia content is created. The work presented in this article provides contributions in three distinct ways towards this goal: firstly, it proposes a methodology to re-train popular neural network models in identifying new thematic concepts in static visual content and attaching meaningful annotations to the detected regions of interest; secondly, it presents varied visual digital effects and corresponding tools that can be automatically called upon to apply such effects in a previously analyzed photo; thirdly, it defines a complete automated creative workflow, from the acquisition of a photograph and corresponding contextual data, through the ML region-based annotation, to the automatic application of digital effects and generation of a semantically aware multimedia story driven by the previously derived situational and visual contextual data. Additionally, it presents a variant of this automated workflow by offering to the user the possibility of manipulating the automatic annotations in an assisted manner. The final aim is to transform a static digital photo into a short video clip, taking into account the information acquired. The final result strongly contrasts with current standard approaches of creating random movements, by implementing an intelligent content-and context-aware video.

Files

Jimaging_08_00068.pdf

(pdf | 6.94 Mb)