Automatic feature augmentation ranking: XGBoost

Bachelor Thesis (2022)
Author(s)

O.L.C. Neut (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Ionescu – Mentor (TU Delft - Web Information Systems)

R. Hai – Mentor (TU Delft - Web Information Systems)

D.H.J. Epema – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2022
Language
English
Graduation Date
20-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic machine learning is a subfield of machine learning that automates the common procedures faced in predictive tasks. The problem of one such procedure is automatic data augmentation, where one desires to enrich the existing data to increase model performance. In relational data repositories, the data is stored in normal form. This causes problems, since joining all tables and subsequently performing feature selection is highly inefficient. This paper provides AFAR, an approach to efficiently and effectively perform automated feature augmentation by ranking candidate joins in a data repository. Additionally, an experimental evaluation that validates the approach’s capabilities, is presented.

Files

Research_Paper_2022.pdf
(pdf | 0.544 Mb)
License info not available