CrossTracker

Robust Multi-Modal 3D Multi-Object Tracking via Cross Correction

Journal Article (2026)
Author(s)

Lipeng Gu (Nanjing University of Aeronautics and Astronautics)

Xuefeng Yan (Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University of Aeronautics and Astronautics)

Weiming Wang (Hong Kong Metropolitan University)

Honghua Chen (Lingnan University, Hong Kong)

Dingkun Zhu (Jiangsu University of Technology)

Liangliang Nan (TU Delft - Urban Data Science)

Mingqiang Wei (Nanjing University of Aeronautics and Astronautics)

Research Group
Urban Data Science
DOI related publication
https://doi.org/10.1109/TCSVT.2025.3601667
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Urban Data Science
Issue number
2
Volume number
36
Pages (from-to)
2191-2206
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Inaccurate detections remain a critical bottleneck in 3D multi-object tracking (MOT). Recent detection fusion-based methods incorporate camera detections as supplementary to reduce false detections and compensate for missing ones in LiDAR. However, their unidirectional camera-LiDAR correction lacks a feedback mechanism, precluding iterative mutual refinement between modalities for more robust LiDAR-based tracking. Inspired by the coarse-to-fine strategy in two-stage object detection, we introduce CrossTracker, a novel two-stage framework for online multi-modal 3D MOT. CrossTracker first constructs coarse camera and LiDAR trajectories independently, then performs trajectory fusion using both current and historical frames, without requiring future data. This ensures more robust mutual refinement between modalities. Specifically, CrossTracker comprises three core modules: i) the multi-modal modeling (M3) module, which fuses data from images, point clouds, and even planar geometry derived from images to establish a robust tracking constraint; ii) the coarse trajectory generation (C-TG) module, which independently generates coarse trajectories for both modalities using the M3 constraint; and iii) the trajectory fusion (TF) module, which applies mutual refinement between coarse LiDAR and camera trajectories through cross correction to ensure robust LiDAR trajectories. Extensive experiments show that CrossTracker outperforms 19 state-of-the-art methods, highlighting its effectiveness in leveraging the synergistic strengths of camera and LiDAR sensors for robust multi-modal 3D MOT. The code is available at https://github.com/lipeng-gu/CrossTracker.

Files

CrossTracker_Robust_Multi-Moda... (pdf)
(pdf | 13.7 Mb)
- Embargo expired in 22-02-2026
Taverne