ViewFormer

None, None; None, None; None, None; None, None

ViewFormer

NeRF-Free Neural Rendering from Few Images Using Transformers

Conference Paper (2022)

Author(s)

Jonáš Kulhánek (Czech Technical University)

Erik Derner (Czech Technical University)

Torsten Sattler (Czech Technical University)

Robert Babuška (Czech Technical University, TU Delft - Learning & Autonomous Control)

Research Group

Learning & Autonomous Control

Copyright

DOI related publication

https://doi.org/10.1007/978-3-031-19784-0_12

Localization Novel view synthesis Neural rendering

To reference this document use:

https://resolver.tudelft.nl/uuid:f447fd7b-ae66-4128-a564-e3b05cfd993e

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Research Group

Learning & Autonomous Control

Bibliographical Note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.@en

Pages (from-to)

198-216

ISBN (print)

978-3-031-19783-3

ISBN (electronic)

978-3-031-19784-0

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Novel view synthesis is a long-standing problem. In this work, we consider a variant of the problem where we are given only a few context views sparsely covering a scene or an object. The goal is to predict novel viewpoints in the scene, which requires learning priors. The current state of the art is based on Neural Radiance Field (NeRF), and while achieving impressive results, the methods suffer from long training times as they require evaluating millions of 3D point samples via a neural network for each image. We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network. Our model uses a two-stage architecture consisting of a codebook and a transformer model. The codebook is used to embed individual images into a smaller latent space, and the transformer solves the view synthesis task in this more compact space. To train our model efficiently, we introduce a novel branching attention mechanism that allows us to use the same model not only for neural rendering but also for camera pose estimation. Experimental results on real-world scenes show that our approach is competitive compared to NeRF-based methods while not reasoning explicitly in 3D, and it is faster to train.

Files

978_3_031_19784_0_12.pdf

(pdf | 2.66 Mb)

- Embargo expired in 01-07-2023

License info not available