A comparative study for using PCA, LDA, GDA, and Lasso for dimensionality reduction before classification algorithms

None, None

A comparative study for using PCA, LDA, GDA, and Lasso for dimensionality reduction before classification algorithms

Bachelor Thesis (2023)

Author(s)

D. Anceaux (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Katsifodimos – Mentor (TU Delft - Web Information Systems)

Andra Ionescu – Mentor (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Feature selection Feature extraction Dimensionality reduction

To reference this document use:

https://resolver.tudelft.nl/uuid:dc867f53-d950-4580-a49a-120b56b07378

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

25-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Since every day more and more data is collected, it becomes more and more expensive to process. To reduce these costs, you can use dimensionality reduction to reduce the number of features per instance in a given dataset.

In this paper, we will compare four possible methods of dimensionality reduction. The feature extraction methods PCA, LDA, and GDA, and the feature selection method Lasso. We will mainly be comparing how the amount of features left over by these methods affects the accuracy of certain classification algorithms, and how long the methods take to achieve their task.

Our research highlights LDA as a highly effective method for significantly reducing the dimensionality of data used in logistic regression and Support Vector Machines (SVMs) with remarkable success. Additionally, we identified Lasso as the preferred choice for situations involving a limited training dataset or when utilizing the random forest algorithm for classification. Notably, Principal Component Analysis (PCA) was observed to occupy a middle ground between LDA’s strengths in aggressive data reduction and Lasso’s accuracy while retaining. GDA (with a linear kernel function) turned out to be significantly slower than the other methods, while its results where most of the time on par with LDA.

Files

BEP_Paper_5_.pdf

(pdf | 0.329 Mb)

License info not available