Recurrent Affine Transform Encoder for Image Representation

Journal Article (2022)
Author(s)

Letao Liu (Nanyang Technological University)

Xudong Jiang (Nanyang Technological University)

Martin Saerbeck (TÜV SÜD Asia Pacific)

J. Dauwels (TU Delft - Signal Processing Systems)

Research Group
Signal Processing Systems
Copyright
© 2022 Letao Liu, Xudong Jiang, Martin Saerbeck, J.H.G. Dauwels
DOI related publication
https://doi.org/10.1109/ACCESS.2022.3150340
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Letao Liu, Xudong Jiang, Martin Saerbeck, J.H.G. Dauwels
Research Group
Signal Processing Systems
Volume number
10
Pages (from-to)
18653-18666
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper proposes a Recurrent Affine Transform Encoder (RATE) that can be used for image representation learning. We propose a learning architecture that enables a CNN encoder to learn the affine transform parameter of images. The proposed learning architecture decomposes an affine transform matrix into two transform matrices and learns them jointly in a self-supervised manner. The proposed RATE is trained by unlabeled image data without any ground truth and infers the affine transform parameter of input images recurrently. The inferred affine transform parameter can be used to represent images in canonical form to greatly reduce the image variations in affine transforms such as rotation, scaling, and translation. Different from the spatial transformer network, the proposed RATE does not need to be embedded into other networks for training with the aid of other learning objectives. We show that the proposed RATE learns the affine transform parameter of images and achieves impressive image representation results in terms of invariance to translation, scaling, and rotation. We also show that the classification performance is enhanced and is more robust against distortion by incorporating the RATE into the existing classification model.