Regularizing AdaBoost to prevent overfitting on label noise

More Info
expand_more

Abstract

In this work, we have shown that AdaBoost is prone to overfitting when the training set contains mislabeled objects. We proposed that this is in part because the error estimate used to weight base classifiers and (indirectly) objects is biased in this scenario. We have shown that an unbiased estimator can prevent the overfitting, but such an estimator is not easily found when the training set is untrustworthy. To remedy this, we introduced the ValidBoost algorithm, which tries to unbias the error estimation AdaBoost makes by taking a validation set from the noisy training set. The size of the validation set increases logarithmically with the number of iterations, reaching 50% of the entire training set in the final iteration. The error ValidBoost uses is a weighted average of the default training error and this validation error, with the weight of the validation error also increasing logarithmically with the number of iterations. We have seen that ValidBoost performs well in comparison to AdaBoost and LogitBoost in the presence of label noise. Compared to Bylander & Tate’s method, a similar variant of AdaBoost also based on the use of validation sets to avoid overfitting, it performs very similar when using decision stumps as base classifiers, and slightly better when using decision trees as classifiers. Perhaps this is because we are using more training data than Bylander & Tate are and decision trees require this extra input in order for the base classifiers themselves not to overfit. The fact that ValidBoost samples a new validation set at every iteration also lead to positive effects. Whereas the other methods might stagnate or stop early because a classifier has been trained with an error above the threshold imposed by boosting, ValidBoost will have a different training set next iteration, thus no reason to stop early and a different error (possibly below the error threshold). This effect can also be seen when using non-deterministic base classifiers with AdaBoost. Because of the relative success of ValidBoost, we reasoned about its core idea and how this can be translated into other domains. Several suggestions were made for possible ways to incorporate it into bagging, neural networks or decision trees. Experiments on such an implementation in bagging (random forests specifically) showed promise, while experiments on an implementation in neural networks worked less well.

Files