Human Interaction in Tabular Data Augmentation in Data Science Workflows

None, None

Human Interaction in Tabular Data Augmentation in Data Science Workflows

Master Thesis (2024)

Author(s)

Z.F. Mouw (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Asterios Katsifodimos – Mentor (TU Delft - Web Information Systems)

E.A. Aivaloglou – Mentor (TU Delft - Web Information Systems)

A. Ionescu – Mentor (TU Delft - Web Information Systems)

N.M. Gürel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine Learning (ML) Data Augmentation HCI

To reference this document use:

https://resolver.tudelft.nl/uuid:daef797f-0ff7-4b44-9a79-e659d4bede4d

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

26-04-2024

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The advancement of artificial intelligence (AI) has led to an increased demand for both a greater volume and quality of data. In many companies, data is dispersed across multiple tables, yet AI models typically require data in a single table format. This necessitates the merging of these tables and the selection of optimal features for the model, a process known as Tabular Data Augmentation (TDA). With the rapid growth of TDA, automated tools have been developed to streamline this process. However, these state-of-the-art tools often make assumptions about user workflows that may not align with the actual needs of data specialists, potentially making them efficient yet not fully user-friendly. Additionally, without thorough evaluation through user studies, these tools may overlook critical steps in the TDA process.

This thesis is divided into two main parts. The first part is dedicated to uncovering the assumptions and oversights within current TDA research through an exhaustive review of recent literature. This is followed by conducting interviews with 19 data specialists. These discussions aim to verify the identified assumptions and reveal any missing elements in state-of-the-art research. The second part focuses on creating a new tool to meet the requirements identified from validated assumptions and the gaps discovered. This tool is then subjected to evaluation interviews to assess its effectiveness.

The findings indicate that data specialists prefer a TDA tool that offers enhanced control and deeper insights into the data augmentation process. To meet these preferences, Human in the Loop AutoTDA was developed, embodying the desired functionalities. Feedback from the evaluation phase confirmed that data specialists find Human in the Loop AutoTDA suitable for their TDA workflows, marking a significant advancement in the field.

Files

Zeger_Mouw_Thesis.pdf

(pdf | 1.82 Mb)

License info not available