Top-Down Networks

A coarse-to-fine reimagination of CNNs

More Info


Biological vision adopts a coarse-to-fine information processing pathway, from initial visual detection and binding of salient features of a visual scene, to the enhanced and preferential processing given relevant stimuli. On the contrary, CNNs employ a fine-to-coarse processing, moving from local, edge-detecting filters to more global ones extracting abstract representations of the input. In the current paper we propose the extraction of top-down networks, by reversing the feature extraction part of the baseline, bottom-up architecture. This coarse-to-fine pathway, by blurring out higher frequency information and restoring it only at later stages, offers a line of defence against attacks introducing high frequency noise. High resolution of the final convolutional layer's feature map can contribute to the transparency of the network's decision making process, as well as favor more object-driven decisions over context driven ones and thus provide better localized class activation maps. The paper offers empirical evidence for the applicability of the method to various existing architectures, but also on multiple visual recognition tasks.