Principal Component Analysis of Education-Related Data Sets

Bachelor Thesis (2020)
Author(s)

T.P.K. Nguyen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. Vuik – Mentor (TU Delft - Numerical Analysis)

K.P. Hart – Graduation committee member (TU Delft - Analysis)

E.D. Wobbes – Mentor (TU Delft - Numerical Analysis)

Erik Fleur – Mentor (Dienst Uitvoering Onderwijs)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2020
Language
English
Graduation Date
21-07-2020
Awarding Institution
Delft University of Technology
Programme
['Applied Mathematics']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Principal Component Analysis (PCA) is a mathematical instrument beneficial for its dimension reduction whilst keeping the most important data. Due to its advantages, PCA is chosen to handle a substantial amount of data. In this thesis two questions are answered: what variables influence a pupil's attainment test score using linear regression and whether PCA provides better linear regression models? The data used in this thesis is provided by DUO, the Dutch Executive Agency for Education. The data contains information about pupils who completed the attainment test in 2008-2013. This thesis starts with a brief description of the data set used for the research and some background information about PCA. Before linear regression can be used, the data is preprocessed. Creating a linear model with all variables resulted in the largest absolute coefficients for teachers' secondary school recommendations. When PCA is applied, it gives great insight into which variables are (likely) dependent on each other: dependent not only in the sense of linear dependency but also the influences on each other in general. Furthermore, PCA also indicates which variables are most likely to have a significant impact. When the data set is free of linearly dependent variables, PCA may give worse fitted models. However, the models are better than models with randomly chosen variables.

Files

Bep_ThaoNguyen.pdf
(pdf | 0.665 Mb)
License info not available