Semantic Alignment in Multiple Stages of Networks for Person Re-ID

Towards Generalizable Models

Master Thesis (2019)
Author(s)

R.S.D. Autar (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

H.S. Hung – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

L.C. Cabrera-Quiros – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Henri Bouma – Graduation committee member (TNO)

Arthur van Rooijen – Graduation committee member (TNO)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2019 Ravi Autar
More Info
expand_more
Publication Year
2019
Language
English
Copyright
© 2019 Ravi Autar
Graduation Date
12-11-2019
Awarding Institution
Delft University of Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Person re-identification (re-ID) is a task that aims to associate the same people across different cameras. One of the many important problems a person re-ID system has to address in order to achieve good performance is the feature misalignment problem. Past research has attempted to address this problem by using attention networks, pose-estimation modules, or semantic segmentation networks. However, they all eventually tend to pool these features to a single feature embedding, thereby not distinguishing regions with different semantic meanings such as the head, torso, and lower body. Most approaches also do not make use of all the information available throughout multiple layers (stages) of the feature extractor. Furthermore, although these additional features are used to provide extra information to the re-ID network during training, they do not take into account the importance of different regions of the image due to, for example, occlusion. To circumvent these problems, we propose a network that is capable of extracting regional feature embeddings that are associated with specific body parts of an identity, i.e., head, upper-body, lower-body, shoes, and foreground image. We extract these features from multiple stages of a feature extractor using a semantic-segmentation module. We then use multi-branch learning to ensure that these features are independently optimized by introducing separate modules (branches) for each regional feature embedding. To increase the robustness of the model, we also propose a novel testing strategy that makes use of the importance and visibility of specific body parts in both the query and gallery images in order to calculate a ranking list. Finally, to address the current lack of datasets that contain images from overhead face-down cameras, we introduce a new dataset named MatchNMingle-reID. Because of the viewpoint of the cameras, this dataset presents unique challenges that are not seen in current datasets and opens possibilities to create more generalizable models that can effectively address the feature misalignment problem.

Files

License info not available