Semantic Alignment in Multiple Stages of Networks for Person Re-ID

Towards Generalizable Models

Master thesis (2019)

Authors

R.S.D. Autar Electrical Engineering, Mathematics and Computer Science

Contributors

H.S. Hung Pattern Recognition and Bioinformatics - (supervisor 1)

L.C. Cabrera Quiros Pattern Recognition and Bioinformatics - (supervisor 1)

Henri Bouma TNO (supervisor 2)

Arthur van Rooijen TNO (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

Deep Learning Feature extraction Person re-ID Multi-stage Multi-branch Semantic-segmentation

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:41734531-3306-47ca-a58a-32e0d8671fe3

Published Date

12-11-2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Person re-identification (re-ID) is a task that aims to associate the same people across different cameras. One of the many important problems a person re-ID system has to address in order to achieve good performance is the feature misalignment problem. Past research has attempted to address this problem by using attention networks, pose-estimation modules, or semantic segmentation networks. However, they all eventually tend to pool these features to a single feature embedding, thereby not distinguishing regions with different semantic meanings such as the head, torso, and lower body. Most approaches also do not make use of all the information available throughout multiple layers (stages) of the feature extractor. Furthermore, although these additional features are used to provide extra information to the re-ID network during training, they do not take into account the importance of different regions of the image due to, for example, occlusion. To circumvent these problems, we propose a network that is capable of extracting regional feature embeddings that are associated with specific body parts of an identity, i.e., head, upper-body, lower-body, shoes, and foreground image. We extract these features from multiple stages of a feature extractor using a semantic-segmentation module. We then use multi-branch learning to ensure that these features are independently optimized by introducing separate modules (branches) for each regional feature embedding. To increase the robustness of the model, we also propose a novel testing strategy that makes use of the importance and visibility of specific body parts in both the query and gallery images in order to calculate a ranking list. Finally, to address the current lack of datasets that contain images from overhead face-down cameras, we introduce a new dataset named MatchNMingle-reID. Because of the viewpoint of the cameras, this dataset presents unique challenges that are not seen in current datasets and opens possibilities to create more generalizable models that can effectively address the feature misalignment problem.

Files

Person_reID_Final_Report.pdf

(.pdf | 12.1 Mb)