One-step time-dependent future video frame prediction with a convolutional encoder-decoder neural network

None, None; None, None; None, None; None, None; None, None

One-step time-dependent future video frame prediction with a convolutional encoder-decoder neural network

Conference Paper (2017)

Author(s)

V. Vukotic (INSA France, INRIA/IRISA, TU Delft - Pattern Recognition and Bioinformatics)

S. Pintea (TU Delft - Pattern Recognition and Bioinformatics)

Christian Raymond (INRIA/IRISA, INSA France)

Guillaume Gravier (Centre national de la recherche scientifique (CNRS), INRIA/IRISA)

Jan van Van Gemert (TU Delft - Pattern Recognition and Bioinformatics)

Research Group

Pattern Recognition and Bioinformatics

DOI related publication

https://doi.org/10.1007/978-3-319-68560-1_13

CNNs Generative models Action forecasting Appearance prediction Future video frame prediction Scene understanding

To reference this document use:

https://resolver.tudelft.nl/uuid:cd6245bc-b348-4841-8bd7-76900a0a48a9

More Info

expand_more

Publication Year

2017

Language

English

Research Group

Pattern Recognition and Bioinformatics

Pages (from-to)

140-151

ISBN (print)

978-3-319-68559-5

ISBN (electronic)

978-3-319-68560-1

Abstract

There is an inherent need for autonomous cars, drones, and other robots to have a notion of how their environment behaves and to anticipate changes in the near future. In this work, we focus on anticipating future appearance given the current frame of a video. Existing work focuses on either predicting the future appearance as the next frame of a video, or predicting future motion as optical flow or motion trajectories starting from a single video frame. This work stretches the ability of CNNs (Convolutional Neural Networks) to predict an anticipation of appearance at an arbitrarily given future time, not necessarily the next video frame. We condition our predicted future appearance on a continuous time variable that allows us to anticipate future frames at a given temporal distance, directly from the input video frame. We show that CNNs can learn an intrinsic representation of typical appearance changes over time and successfully generate realistic predictions at a deliberate time difference in the near future.

No files available

Metadata only record. There are no files for this record.