Object extent pooling for weakly supervised single-shot localization

Conference Paper (2017)
Author(s)

Amogh Gudi (Vicarious Perception Technologies, TU Delft - Electrical Engineering, Mathematics and Computer Science)

Nicolai Van Rosmalen (Vicarious Perception Technologies)

Marco Loog (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Jan Van Gemert (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Pattern Recognition and Bioinformatics
DOI related publication
https://doi.org/10.5244/c.31.36 Final published version
More Info
expand_more
Publication Year
2017
Language
English
Related content
Research Group
Pattern Recognition and Bioinformatics
ISBN (electronic)
9781901725605
Event
28th British Machine Vision Conference, BMVC 2017 (2017-09-04 - 2017-09-07), London, United Kingdom
Downloads counter
219
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In the face of scarcity in detailed training annotations, the ability to perform object localization tasks in real-time with weak-supervision is very valuable. However, the computational cost of generating and evaluating region proposals is heavy. We adapt the concept of Class Activation Maps (CAM) [28] into the very first weakly-supervised ‘single-shot’ detector that does not require the use of region proposals. To facilitate this, we propose a novel global pooling technique called Spatial Pyramid Averaged Max (SPAM) pooling for training this CAM-based network for object extent localisation with only weak image-level supervision. We show this global pooling layer possesses a near ideal flow of gradients for extent localization, that offers a good trade-off between the extremes of max and average pooling. Our approach only requires a single network pass and uses a fast-backprojection technique, completely omitting any region proposal steps. To the best of our knowledge, this is the first approach to do so. Due to this, we are able to perform inference in real-time at 35fps, which is an order of magnitude faster than all previous weakly supervised object localization frameworks.

Files

Paper036.pdf
(pdf | 6.51 Mb)
License info not available