CrossTracker

None, None; None, None; None, None; None, None; None, None; None, None; None, None

CrossTracker

Robust Multi-Modal 3D Multi-Object Tracking via Cross Correction

Journal Article (2026)

Author(s)

Lipeng Gu (Nanjing University of Aeronautics and Astronautics)

Xuefeng Yan (Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University of Aeronautics and Astronautics)

Weiming Wang (Hong Kong Metropolitan University)

Honghua Chen (Lingnan University, Hong Kong)

Dingkun Zhu (Jiangsu University of Technology)

Liangliang Nan (TU Delft - Urban Data Science)

Mingqiang Wei (Nanjing University of Aeronautics and Astronautics)

Research Group

Urban Data Science

DOI related publication

https://doi.org/10.1109/TCSVT.2025.3601667

CrossTracker Cross correction Multi-modal 3D MOT Trajectory fusion Two-stage solution

To reference this document use:

https://resolver.tudelft.nl/uuid:95a9e6fc-5639-4c79-8704-236a94f8f6bf

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Urban Data Science

Issue number

2

Volume number

36

Pages (from-to)

2191-2206

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Inaccurate detections remain a critical bottleneck in 3D multi-object tracking (MOT). Recent detection fusion-based methods incorporate camera detections as supplementary to reduce false detections and compensate for missing ones in LiDAR. However, their unidirectional camera-LiDAR correction lacks a feedback mechanism, precluding iterative mutual refinement between modalities for more robust LiDAR-based tracking. Inspired by the coarse-to-fine strategy in two-stage object detection, we introduce CrossTracker, a novel two-stage framework for online multi-modal 3D MOT. CrossTracker first constructs coarse camera and LiDAR trajectories independently, then performs trajectory fusion using both current and historical frames, without requiring future data. This ensures more robust mutual refinement between modalities. Specifically, CrossTracker comprises three core modules: i) the multi-modal modeling (M3) module, which fuses data from images, point clouds, and even planar geometry derived from images to establish a robust tracking constraint; ii) the coarse trajectory generation (C-TG) module, which independently generates coarse trajectories for both modalities using the M3 constraint; and iii) the trajectory fusion (TF) module, which applies mutual refinement between coarse LiDAR and camera trajectories through cross correction to ensure robust LiDAR trajectories. Extensive experiments show that CrossTracker outperforms 19 state-of-the-art methods, highlighting its effectiveness in leveraging the synergistic strengths of camera and LiDAR sensors for robust multi-modal 3D MOT. The code is available at https://github.com/lipeng-gu/CrossTracker.

Files

CrossTracker_Robust_Multi-Moda... (pdf)

(pdf | 13.7 Mb)

- Embargo expired in 22-02-2026

Taverne