Investigating Data Collection and Reporting Practices of Human Annotations in Societally Impactful Machine Learning Applications

A Systematic Review of Top-Cited IEEE Access Papers

Bachelor Thesis (2023)
Author(s)

A. Ibrahim (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Cynthia C.S. Liem – Mentor (TU Delft - Multimedia Computing)

Andrew Demetriou – Mentor (TU Delft - Multimedia Computing)

F. Broz – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Ahmed Ibrahim
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Ahmed Ibrahim
Graduation Date
28-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This systematic review investigates the practices and implications of human annotations in machine learning (ML) research. Analyzing a selection of 100 papers from the IEEE Access Journal, the study explores the data collection and reporting methods employed. The findings reveal a prevalent lack of standardization and formalization in the annotation process. Key details such as annotation sources, number of annotators, and formal instructions are frequently neglected, possibly compromising the quality and effectiveness of ML algorithms. Domain-specific implications are discussed, highlighting the need for comprehensive annotation practices in areas like medical diagnostics, language processing, and intelligent vehicle systems. The study contributes to the field by emphasizing the importance of standardized procedures and transparency in ML research. Future research is recommended to develop systematic annotation methodologies and examine the impact of subpar annotation on data quality.

Files

License info not available