Efficient and effective feature discovery for CART decision tree model
A.B.C. Bien (TU Delft - Electrical Engineering, Mathematics and Computer Science)
R. Hai – Mentor (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
A common challenge in feature discovery and feature selection is the trade-off between effectiveness and efficiency. The paper proposes a solution that is efficient and effective at ranking features for feature discovery.
This paper aims to improve feature discovery techniques, by estimating the overall utility of features, through ranking them by their characteristics, such as the correlation coefficient, gini impurity, information gain, etc. The approach to estimate the overall utility is done by calculating the likelihoods of a feature being selected with a wrapper feature selection technique, given their ranking with respect to their characteristics. The likelihoods of the rankings are recorded and combined to estimate the overall utility of a feature which is used to rank all the features by their utility.