Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation

None, None; None, None; None, None; None, None

Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation

Conference Paper (2022)

Author(s)

L. Zeng (Student TU Delft)

A. Lengyel (TU Delft - Electrical Engineering, Mathematics and Computer Science)

N. Tömen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.C. van Gemert (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Pattern Recognition and Bioinformatics

To reference this document use

https://resolver.tudelft.nl/uuid:92b07c5d-d387-47ca-b542-aba467a34e2e

More Info

expand_more

Publication Year

2022

Language

English

Research Group

Pattern Recognition and Bioinformatics

Event

33rd British Machine Vision Conference 2022 (2022-11-21 - 2022-11-24), London, United Kingdom

Downloads counter

271

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this work, we leverage estimated depth to boost self-supervised contrastive learning for segmentation of urban scenes, where unlabeled videos are readily available for training self-supervised depth estimation. We argue that the semantics of a coherent group of pixels in 3D space is self-contained and invariant to the contexts in which they appear. We group coherent, semantically related pixels into coherent depth regions given their estimated depth and use copy-paste to synthetically vary their contexts. In this way, cross-context correspondences are built in contrastive learning and a context-invariant representation is learned. For unsupervised semantic segmentation of urban scenes, our method surpasses the previous state-of-the-art baseline by +7.14% in mIoU on Cityscapes and +6.65% on KITTI. For fine-tuning on Cityscapes and KITTI segmentation, our method is competitive with existing models, yet, we do not need to pre-train on ImageNet or COCO, while we are also more computationally efficient. Our code is available on https://github.com/LeungTsang/CPCDR.

Files

0893.pdf

(pdf | 10.6 Mb)

License info not available