Explaining overthinking in Multi-Scale Dense networks

Why more computation does not always lead to better results

More Info
expand_more

Abstract

Traditional convolutional neural networks exhibit an inherent limitation, they can not adapt their computation to the input while some inputs require less computation to arrive at an accurate prediction than others. Early-exiting setups exploit this fact by only spending as much computation as is necessary and subsequently exiting the sample early. In an end-to-end trained convolutional neural network with multiple classifiers, one might expect deeper classifiers to perform better in every circumstance than shallow classifiers; deeper layers make use of the computation done by earlier layers after all. However, this is not always the case and more computation can lead to worse results. This
phenomenon, which has been dubbed overthinking, has been documented in several traditional convolutional neural networks with intermediate classifiers. It has been conjectured that it happens due to later classifiers making use of more complex feature which benefit from a larger receptive field. These later classifiers then claim to discern said features in regions of the image which do not contain them, effectively making the classifiers misclassify images that can be classified correctly by shallow classifiers. However, we have observed overthinking in Multi-Scale Dense networks, an end-to-end hand-tuned network optimized for early-exiting for which the given argument in relation to the receptive field does not hold due to its unique architecture. For this reason, in this thesis we attempt to explain overthinking in Multi-Scale Dense networks. We show that in general there seems to be no connection between what a classifier in a Multi-Scale Dense network learns and the data itself. This in turn suggests that overthinking does not take place due to specialization of the classifiers. Instead, we offer up an alternative theory for overthinking in the form of stochasticity inherent to the training process.