Explaining overthinking in Multi-Scale Dense networks

Why more computation does not always lead to better results

Master thesis (2022)

Authors

D.M. Voorhout Electrical Engineering, Mathematics and Computer Science

Contributors

J.C. van Gemert Pattern Recognition and Bioinformatics - (mentor)

X. Liu Pattern Recognition and Bioinformatics - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

CNN Early-exiting Multi-Scale Dense network Policy network Adaptive inference

To reference this document use:

http://resolver.tudelft.nl/uuid:7814e516-e9a9-4633-ab48-b55f5364f7c3

More Info

expand_more

Published Date

16-02-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Traditional convolutional neural networks exhibit an inherent limitation, they can not adapt their computation to the input while some inputs require less computation to arrive at an accurate prediction than others. Early-exiting setups exploit this fact by only spending as much computation as is necessary and subsequently exiting the sample early. In an end-to-end trained convolutional neural network with multiple classifiers, one might expect deeper classifiers to perform better in every circumstance than shallow classifiers; deeper layers make use of the computation done by earlier layers after all. However, this is not always the case and more computation can lead to worse results. This
phenomenon, which has been dubbed overthinking, has been documented in several traditional convolutional neural networks with intermediate classifiers. It has been conjectured that it happens due to later classifiers making use of more complex feature which benefit from a larger receptive field. These later classifiers then claim to discern said features in regions of the image which do not contain them, effectively making the classifiers misclassify images that can be classified correctly by shallow classifiers. However, we have observed overthinking in Multi-Scale Dense networks, an end-to-end hand-tuned network optimized for early-exiting for which the given argument in relation to the receptive field does not hold due to its unique architecture. For this reason, in this thesis we attempt to explain overthinking in Multi-Scale Dense networks. We show that in general there seems to be no connection between what a classifier in a Multi-Scale Dense network learns and the data itself. This in turn suggests that overthinking does not take place due to specialization of the classifiers. Instead, we offer up an alternative theory for overthinking in the form of stochasticity inherent to the training process.

Files

Thesis_DVoorhout_v0.6.pdf

(.pdf | 5.99 Mb)