A. Ionescu | TU Delft Repository

Human Interaction in Tabular Data Augmentation in Data Science Workflows

Master thesis (2024) - Z.F. Mouw (author) , Asterios Katsifodimos (mentor) , E.A. Aivaloglou (mentor) , Andra Ionescu (mentor) , N.M. Gürel (graduation committee member)

The advancement of artificial intelligence (AI) has led to an increased demand for both a greater volume and quality of data. In many companies, data is dispersed across multiple tables, yet AI models typically require data in a single table format. This necessitates the merging ...

Automatic feature discovery

A comparative study between filter and wrapper feature selection techniques

Bachelor thesis (2023) - A.B. Mânăstireanu (author) , A. Ionescu (mentor) , Asterios Katsifodimos (mentor) , Elvin Isufi (graduation committee member)

The curse of dimensionality is a common challenge in machine learning, and feature selection techniques are commonly employed to address this issue by selecting a subset of relevant features. However, there is no consistently superior approach for choosing the most significant su ...

Filtering Knowledge: A Comparative Analysis of Information-Theoretical-Based Feature Selection Methods

Bachelor thesis (2023) - K.V. Vasilev (author) , Asterios Katsifodimos (mentor) , A. Ionescu (mentor) , Elvin Isufi (graduation committee member)

The data used in machine learning algorithms strongly influences the algorithms' capabilities. Feature selection techniques can choose a set of columns that meet a certain learning goal. There is a wide variety of feature selection methods, however, the ones we cover in this comp ...

Encoding methods for categorical data

A comparative analysis for linear models, decision trees, and support vector machines

Bachelor thesis (2023) - A. Udilă (author) , A. Ionescu (mentor) , Asterios Katsifodimos (mentor) , Elvin Isufi (graduation committee member)

This paper presents a comprehensive evaluation and comparison of encoding methods for categorical data in the context of machine learning. The study focuses on five popular encoding techniques: one-hot, ordinal, target, catboost, and count encoders. These methods are evaluated us ...

Data-Driven Empirical Analysis of Correlation-Based Feature Selection Techniques

Bachelor thesis (2023) - I. Buşe (author) , Andra Ionescu (mentor) , Asterios Katsifodimos (mentor) , Elvin Isufi (graduation committee member)

Thus far the democratization of machine learning, which resulted in the field of AutoML, has focused on the automation of model selection and hyperparameter optimization. Nevertheless, the need for high-quality databases to increase performance has sparked interest in correlation ...

A comparative study for using PCA, LDA, GDA, and Lasso for dimensionality reduction before classification algorithms

Bachelor thesis (2023) - D. Anceaux (author) , A Katsifodimos (mentor) , A. Ionescu (mentor)

Since every day more and more data is collected, it becomes more and more expensive to process. To reduce these costs, you can use dimensionality reduction to reduce the number of features per instance in a given dataset.

In this paper, we will compare four possible met ...

From Feature Selection to Data Augmentation: the ADA Algorithm

Bachelor thesis (2022) - E. Cruset Pla (author) , R. Hai (mentor) , Andra Ionescu (mentor) , D.H.J. Epema (graduation committee member)

The democratization of data science, and in particular of the machine learning pipeline, has focused on the automation of model selection, feature processing, and hyperparameter tuning. Nevertheless, the need for high-quality data for increased performance has sparked interest in ...

PCADA: Partial Correlation Aware Data Augmentation for random forest classifier

Bachelor thesis (2022) - Oskar Lorek (author) , A. Ionescu (mentor) , R. Hai (mentor) , D.H.J. Epema (graduation committee member)

Machine learning models require rich, quality data sets to achieve high accuracy. With current exponential growth of data being generated it is becoming increasingly hard to prepare high-quality tables within reasonable time frame. To combat this issue automated data augmentation ...

Automatic feature augmentation ranking: XGBoost

Bachelor thesis (2022) - O.L.C. Neut (author) , Andra Ionescu (mentor) , R. Hai (mentor) , D.H.J. Epema (graduation committee member)

Automatic machine learning is a subfield of machine learning that automates the common procedures faced in predictive tasks. The problem of one such procedure is automatic data augmentation, where one desires to enrich the existing data to increase model performance. In relationa ...

An exploratory journey to combine schema matchers for better relevance prediction

Master thesis (2022) - W.H. Wang (author) , A Katsifodimos (mentor) , G.J.P.M. Houben (graduation committee member) , Y. Chen (graduation committee member) , A. Ionescu (mentor)

Current speed of data growth has exponentially increased over the past decade, highlighting the need of modern organizations for data discovery systems. Several (automated) schema matching approaches have been proposed to find related data, exploiting different parts of schema in ...