Principal Component Analysis of Education-Related Data Sets

More Info
expand_more

Abstract

Principal Component Analysis (PCA) is a mathematical instrument beneficial for its dimension reduction whilst keeping the most important data. Due to its advantages, PCA is chosen to handle a substantial amount of data. In this thesis two questions are answered: what variables influence a pupil's attainment test score using linear regression and whether PCA provides better linear regression models? The data used in this thesis is provided by DUO, the Dutch Executive Agency for Education. The data contains information about pupils who completed the attainment test in 2008-2013. This thesis starts with a brief description of the data set used for the research and some background information about PCA. Before linear regression can be used, the data is preprocessed. Creating a linear model with all variables resulted in the largest absolute coefficients for teachers' secondary school recommendations. When PCA is applied, it gives great insight into which variables are (likely) dependent on each other: dependent not only in the sense of linear dependency but also the influences on each other in general. Furthermore, PCA also indicates which variables are most likely to have a significant impact. When the data set is free of linearly dependent variables, PCA may give worse fitted models. However, the models are better than models with randomly chosen variables.

Files