How Can We convert 2D Pixel Art into a 3D Voxel Representation

Exploring different 3D reconstruction algorithms on 2D pixel input

Bachelor Thesis (2025)
Author(s)

T. Krajtmajer (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

P. Kellnhofer – Mentor (TU Delft - Computer Graphics and Visualisation)

M. Molenaar – Mentor (TU Delft - Computer Graphics and Visualisation)

Joana Goncalves – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

E. Eisemann – Mentor (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
26-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper investigates how standard 3D reconstruction techniques can be adapted to work with orthographic projections of pixel art images. Reconstructing 3D models from 2D images is typically done with real world objects. However, little work has explored this problem in the context of pixel art, which has a lower resolution, and uses stylistic colors and shading, making it a problem that requires different or adapted techniques. We implement three reconstruction algorithms based on some common practices in the field: silhouette-based intersection, spatial carving using color consistency, and gradient-based depth estimation. Results show that silhouette intersection is an effective method for simple models, but fails to capture concave regions. Spatial carving addresses this limitation, although it also fails in some use cases. An adapted depth estimation technique is also used, which most accurately captures these regions, although it does not fill in the model as well. We implement a custom Blender plugin to support user annotations and improve accuracy in some ambiguous areas. We conclude that a hybrid approach, where silhouette intersection is combined with depth estimation, gives the most accurate results, and we suggest future work that can be done on the topic, including better color merging principles, adding custom viewpoint orientations, adding support for multiple object reconstruction and using perspective projections.

Files

Rp_final-3-1.pdf
(pdf | 16.1 Mb)
License info not available