Analysing Data Features on Algorithmic Fairness in Machine Learning

Comparing the sensitivity of data features under fairness properties between different sectors

Bachelor Thesis (2024)
Author(s)

P. Markesinis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Lukina – Mentor (TU Delft - Algorithmics)

Christoph Lofi – Graduation committee member (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
26-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Fairness in machine learning is an increasingly important yet complex issue, especially as these algorithms are integrated into critical decision-making processes across various sec- tors. This research focuses on the impact of features under fairness properties across multiple sectors. The primary research question ad- dressed is: “Which data features are the most sensitive when monitoring fairness properties on criminal data, and how do these features perform when monitoring fairness properties on data from different sectors?” The study ex- amines features such as age, race, gender, and educational level across datasets from criminal justice, healthcare, finance, and education sectors. Utilizing logistic regression models and a proposed dynamic monitoring algorithm, sensitivity of features to fairness violations is assessed for Demographic Parity and Equal Opportunity properties. The findings indicate that age is the most sensitive feature in almost all sectors, highlighting inherent biases and the necessity for sector-specific fair- ness considerations. However, statistical analysis revealed that these differences in sensitivity values across sectors were not statistically significant, suggesting that the observed pat- terns are not strong enough to be deemed conclusive.

Files

License info not available