LeAP: Label any Pointcloud in any domain using Foundation Models

Master Thesis (2024)
Author(s)

J.S. Gebraad (TU Delft - Mechanical Engineering)

Contributor(s)

Andras Palffy – Mentor (TU Delft - Intelligent Vehicles)

Holger Caesar – Graduation committee member (TU Delft - Intelligent Vehicles)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
10-09-2024
Awarding Institution
Delft University of Technology
Programme
Mechanical Engineering | Robotics
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

3D semantic understanding is essential for a wide range of robotics applications. Availability of datasets is a strong driver for research, and whilst obtaining unlabeled data is straightforward, manually annotating this data with semantic labels is time-consuming and costly. Recently, foundation models have facilitated open-set semantic segmentation, potentially aiding automatic labeling. However, these models have largely been limited to 2D images. This work introduces Label Any Pointcloud (LeAP), which leverages 2D Vision Foundation Models (VFMs) to automatically label 3D data with any set of classes in any kind of application. VFMs are used to create image labels for the desired classes which are then projected to 3D points. Using the Bayesian update, point-wise labels are combined into voxels to improve label consistency, and label points outside the camera fustrum. A novel Cross-Modal Self-Training (CM-ST) approach further enhances label quality. Through extensive experiments, we demonstrate that our method can generate high-quality 3D semantic labels across diverse fields without any manual 3D labeling. Models adapted to new application domains using our labels show up to 3.7× (12.9 → 47.1) mIoU improvement compared to the unadapted baselines. This ability to provide labels for any domain can help accelerate 3D perception research.

Files

License info not available
warning

File under embargo until 09-09-2026