Semantic Alignment in Multiple Stages of Networks for Person Re-ID

Towards Generalizable Models

More Info
expand_more

Abstract

Person re-identification (re-ID) is a task that aims to associate the same people across different cameras. One of the many important problems a person re-ID system has to address in order to achieve good performance is the feature misalignment problem. Past research has attempted to address this problem by using attention networks, pose-estimation modules, or semantic segmentation networks. However, they all eventually tend to pool these features to a single feature embedding, thereby not distinguishing regions with different semantic meanings such as the head, torso, and lower body. Most approaches also do not make use of all the information available throughout multiple layers (stages) of the feature extractor. Furthermore, although these additional features are used to provide extra information to the re-ID network during training, they do not take into account the importance of different regions of the image due to, for example, occlusion. To circumvent these problems, we propose a network that is capable of extracting regional feature embeddings that are associated with specific body parts of an identity, i.e., head, upper-body, lower-body, shoes, and foreground image. We extract these features from multiple stages of a feature extractor using a semantic-segmentation module. We then use multi-branch learning to ensure that these features are independently optimized by introducing separate modules (branches) for each regional feature embedding. To increase the robustness of the model, we also propose a novel testing strategy that makes use of the importance and visibility of specific body parts in both the query and gallery images in order to calculate a ranking list. Finally, to address the current lack of datasets that contain images from overhead face-down cameras, we introduce a new dataset named MatchNMingle-reID. Because of the viewpoint of the cameras, this dataset presents unique challenges that are not seen in current datasets and opens possibilities to create more generalizable models that can effectively address the feature misalignment problem.