Filtering Knowledge: A Comparative Analysis of Information-Theoretical-Based Feature Selection Methods

None, None

Filtering Knowledge: A Comparative Analysis of Information-Theoretical-Based Feature Selection Methods

Bachelor Thesis (2023)

Author(s)

K.V. Vasilev (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Asterios Katsifodimos – Mentor (TU Delft - Web Information Systems)

A. Ionescu – Mentor (TU Delft - Web Information Systems)

Elvin Isufi – Graduation committee member (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Data Augmentation Information Theory Feature Selection Comparative analysis

To reference this document use:

https://resolver.tudelft.nl/uuid:fbcf96d8-3685-4838-85e6-ee6887c25e15

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

28-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The data used in machine learning algorithms strongly influences the algorithms' capabilities. Feature selection techniques can choose a set of columns that meet a certain learning goal. There is a wide variety of feature selection methods, however, the ones we cover in this comparative analysis are part of the information-theoretical-based family. We evaluate MIFS, MRMR, CIFE, and JMI using the machine learning algorithms Logistic Regression, XGBoost, and Support Vector Machines.
Multiple datasets with a variety of feature types are used during evaluation. We find that MIFS and MRMR are 2-4 times faster than CIFE and JMI. MRMR and JMI choose columns that lead to significantly higher accuracy and lower root mean squared error earlier. The results we present here can help data scientists pick the right feature selection method depending on the datasets used.

Files

Kiril_vasilev_filtering_knowle... (pdf)

(pdf | 1.02 Mb)

License info not available