A trainable activation scheme for applying Neural Radiance Fields in novel view synthesis tasks

More Info
expand_more

Abstract

3D scene reconstruction is a common computer vision task with many applications. The synthesized virtual environments are beneficial for many downstream applications such as 3D modeling, building inspection, virtual reality, etc. As conventional scene reconstruction methods often require expensive data collections and prior information on the geometry, a learning-based method named Neural Radiance Fields (NeRF) has gained a surge of interest within the computer vision community recently for its capacity to achieve state-of-the-art view synthesis performance and achieve photorealistic rendering from only a sequence of RGB images as the input. However, NeRF has a strict requirement for input images with accurate camera poses, which is often not available in real-life applications. To this end, we provide an end-to-end guideline for 3D scene reconstruction using NeRF under a real-world scenario the objective is to reconstruct the HDB facade in Singapore. This guideline requires only RGB images as the input and can achieve photorealistic rendering results which are competitive with the results from the conventional point cloud-based method. Besides, we build our own models upon NeRF and also improve its performance in representing fine details. We first research that the ability to represent high-frequency contents in the signal is limited by its ReLU activations in the Multi-layer Perceptron (MLP) network, and demonstrate that mapping the input to the MLPs from low dimensional space to high dimensional space significantly improves the reconstruction and view synthesis quality. Afterward, We make several attempts to perform input mapping. We first use Gaussian-distributed Fourier features to replace the original positional encoding used in NeRF. Then, we research the activations in coordinate MLP and propose an embedding-less NeRF model equipped with parameterized sine activations called SIRENeRF. Next, we extend the use of parameterized activations from sine activations to a class of non-periodic activations and propose a trainable activation scheme that not only achieves higher scene reconstruction results but also enjoys better flexibility to different datasets. Experiments show that all of the above attempts outperform the positional encoding in terms of scene reconstruction and view synthesis results.