LeAP: Label any Pointcloud in any domain using Foundation Models

None, None

LeAP: Label any Pointcloud in any domain using Foundation Models

Master Thesis (2024)

Author(s)

J.S. Gebraad (TU Delft - Mechanical Engineering)

Contributor(s)

Andras Palffy – Mentor (TU Delft - Intelligent Vehicles)

Holger Caesar – Graduation committee member (TU Delft - Intelligent Vehicles)

Faculty

Mechanical Engineering

UAV Drone Dataset LiDAR Camera Semantic Segmentation Object Detection Unmanned Aerial Vehicle LiDAR data Camera-LiDAR system Foundation Models Machine perception Automatic labeling

To reference this document use:

https://resolver.tudelft.nl/uuid:70e3fd51-8d47-4a28-bb88-ab79e967b409

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

10-09-2024

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

3D semantic understanding is essential for a wide range of robotics applications. Availability of datasets is a strong driver for research, and whilst obtaining unlabeled data is straightforward, manually annotating this data with semantic labels is time-consuming and costly. Recently, foundation models have facilitated open-set semantic segmentation, potentially aiding automatic labeling. However, these models have largely been limited to 2D images. This work introduces Label Any Pointcloud (LeAP), which leverages 2D Vision Foundation Models (VFMs) to automatically label 3D data with any set of classes in any kind of application. VFMs are used to create image labels for the desired classes which are then projected to 3D points. Using the Bayesian update, point-wise labels are combined into voxels to improve label consistency, and label points outside the camera fustrum. A novel Cross-Modal Self-Training (CM-ST) approach further enhances label quality. Through extensive experiments, we demonstrate that our method can generate high-quality 3D semantic labels across diverse fields without any manual 3D labeling. Models adapted to new application domains using our labels show up to 3.7× (12.9 → 47.1) mIoU improvement compared to the unadapted baselines. This ability to provide labels for any domain can help accelerate 3D perception research.

Files

Thesis_Paper_Simon_Gebraad.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 09-09-2026