Martin Weinmann
Please Note
6 records found
1
Density-based Geometric Convergence of NeRFs at Training Time
Insights from Spatio-temporal Discretization
Whereas emerging learning-based scene representations are predominantly evaluated based on image quality metrics such as PSNR, SSIM or LPIPS, only a few investigations focus on the evaluation of geometric accuracy of the underlying model. In contrast to only demonstrating the geometric deviations of models for the fully optimized scene model, our work aims at investigating the geometric convergence behavior during the optimization. For this purpose, we analyze the geometric convergence of discretized density fields by leveraging respectively derived point cloud representations for different training steps during the optimization of the scene representation and their comparison based on established point cloud metrics, thereby allowing insights regarding which scene parts are already represented well within the scene representation at a certain time during the optimization. By demonstrating that certain regions reach convergence earlier than other regions in the scene, we provide the motivation regarding future developments on locally-guided optimization approaches to shift the computational burden to the adjustment of regions that still need to converge while leaving converged regions unchanged which might help to further reduce training time and improve the achieved quality.
In this paper, we focus on investigating the potential of advanced Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting for 3D scene reconstruction from aerial imagery obtained via sensor platforms with an almost nadir-looking camera. Such a setting for image acquisition is convenient for capturing large-scale urban scenes, yet it poses particular challenges arising from imagery with large overlap, very short baselines, similar viewing direction and almost the same but large distance to the scene, and it therefore differs from the usual object-centric scene capture. We apply a traditional approach for image-based 3D reconstruction (COLMAP), a modern NeRF-based approach (Nerfacto) and a representative for the recently introduced 3D Gaussian Splatting approaches (Splatfacto), where the latter two are provided in the Nerfstudio framework. We analyze results achieved on the recently released UseGeo dataset both quantitatively and qualitatively. The achieved results reveal that the traditional COLMAP approach still outperforms Nerfacto and Splatfacto approaches for various scene characteristics, such as less-textured areas, areas with high vegetation, shadowed areas and areas observed from only very few views.
PriNeRF
Prior constrained Neural Radiance Field for robust novel view synthesis of urban scenes with fewer views
Novel view synthesis (NVS) of urban scenes enables the exploration of cities virtually and interactively, which can further be used for urban planning, navigation, digital tourism, etc. However, many current NVS methods require a large amount of images from known views as input and are sensitive to intrinsic and extrinsic camera parameters. In this paper, we propose a new unified framework for NVS of urban scenes with fewer required views via the integration of scene priors and the joint optimization of camera parameters under an geometric constraint along with NeRF weights. The integration of scene priors makes full use of the priors from the neighbor reference views to reduce the number of required known views. The joint optimization can correct the errors in camera parameters, which are usually derived from algorithms like Structure-from-Motion (SfM), and then further improves the quality of the generated novel views. Experiments show that our method achieves about 25.375 dB and 25.512 dB in average in terms of peak signal-to-noise (PSNR) on synthetic and real data, respectively. It outperforms popular state-of-the-art methods (i.e., BungeeNeRF and MegaNeRF) by about 2–4 dB in PSNR. Notably, our method achieves better or competitive results than the baseline method with only one third of the known view images required for the baseline. The code and dataset are available at https://github.com/Dongber/PriNeRF.
The Microsoft HoloLens is a head-worn mobile augmented reality device. It allows a real-time 3D mapping of its direct environment and a self-localisation within the acquired 3D data. Both aspects are essential for robustly augmenting the local environment around the user with virtual contents and for the robust interaction of the user with virtual objects. Although not primarily designed as an indoor mapping device, the Microsoft HoloLens has a high potential for an efficient and comfortable mapping of both room-scale and building-scale indoor environments. In this paper, we provide a survey on the capabilities of the Microsoft HoloLens (Version 1) for the efficient 3D mapping and modelling of indoor scenes. More specifically, we focus on its capabilities regarding the localisation (in terms of pose estimation) within indoor environments and the spatial mapping of indoor environments. While the Microsoft HoloLens can certainly not compete in providing highly accurate 3D data like laser scanners, we demonstrate that the acquired data provides sufficient accuracy for a subsequent standard rule-based reconstruction of a semantically enriched and topologically correct model of an indoor scene from the acquired data. Furthermore, we provide a discussion with respect to the robustness of standard handcrafted geometric features extracted from data acquired with the Microsoft HoloLens and typically used for a subsequent learning-based semantic segmentation.