Filtering Knowledge: A Comparative Analysis of Information-Theoretical-Based Feature Selection Methods

Bachelor Thesis (2023)
Author(s)

K.V. Vasilev (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Asterios Katsifodimos – Mentor (TU Delft - Web Information Systems)

A. Ionescu – Mentor (TU Delft - Web Information Systems)

Elvin Isufi – Graduation committee member (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Kiril Vasilev
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Kiril Vasilev
Graduation Date
28-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The data used in machine learning algorithms strongly influences the algorithms' capabilities. Feature selection techniques can choose a set of columns that meet a certain learning goal. There is a wide variety of feature selection methods, however, the ones we cover in this comparative analysis are part of the information-theoretical-based family. We evaluate MIFS, MRMR, CIFE, and JMI using the machine learning algorithms Logistic Regression, XGBoost, and Support Vector Machines.
Multiple datasets with a variety of feature types are used during evaluation. We find that MIFS and MRMR are 2-4 times faster than CIFE and JMI. MRMR and JMI choose columns that lead to significantly higher accuracy and lower root mean squared error earlier. The results we present here can help data scientists pick the right feature selection method depending on the datasets used.

Files

License info not available