Variable importance measures for random forests

Master Thesis (2021)
Author(s)

C.J.M. Boon (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Parolya – Mentor (TU Delft - Statistics)

José Ferreira – Graduation committee member

Dorota Kurowicka – Graduation committee member (TU Delft - Applied Probability)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Cindy Boon
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Cindy Boon
Graduation Date
09-07-2021
Awarding Institution
Delft University of Technology
Programme
['Applied Mathematics']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Measuring variable importance is often a difficult task: among others models can be complex and covariates can interact with each other and can be correlated. This study focuses on two questions: First, what should be the theoretical measure of variable importance under a given data-generating model? And second, what are the best estimates of these theoretical measures? Two theoretical measures and some corresponding estimates are presented of which one is the well-known random forests variable importance measure (Breiman, 2001). A simulation study is done for both linear and nonlinear models to find out what are the best estimates of variable importance measures for given data-generating models. Most measures struggle when covariates are correlated, but make an improvement in performance when the number of split variables is tuned.

Files

License info not available