Variable importance measures for random forests

None, None

Variable importance measures for random forests

Master Thesis (2021)

Author(s)

C.J.M. Boon (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Parolya – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

José Ferreira – Graduation committee member

D. Kurowicka – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Correlation Random Forests Variable importance measures

To reference this document use

https://resolver.tudelft.nl/uuid:bc1b0369-efea-47e6-95e1-ece779ce736a

More Info

expand_more

Publication Year

2021

Language

English

Graduation Date

09-07-2021

Awarding Institution

Delft University of Technology

Programme

Applied Mathematics

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

361

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Measuring variable importance is often a difficult task: among others models can be complex and covariates can interact with each other and can be correlated. This study focuses on two questions: First, what should be the theoretical measure of variable importance under a given data-generating model? And second, what are the best estimates of these theoretical measures? Two theoretical measures and some corresponding estimates are presented of which one is the well-known random forests variable importance measure (Breiman, 2001). A simulation study is done for both linear and nonlinear models to find out what are the best estimates of variable importance measures for given data-generating models. Most measures struggle when covariates are correlated, but make an improvement in performance when the number of split variables is tuned.

Files

Master_thesis_report_CJM_Boon.... (pdf)

(pdf | 1.69 Mb)

License info not available