Annotation Practices in Societally Impactful Machine Learning Applications
What are these automated systems actually trained on?
D. Košutić (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Andrew Demetriou – Mentor (TU Delft - Multimedia Computing)
Cynthia C.S. Liem – Mentor (TU Delft - Multimedia Computing)
J. Yang – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The output of machine learning (ML) models can be only as good as the data that is fed into them. Because of this, when making datasets for creating ML models, it is important to ensure the quality of the data. This is especially true of human labeled data, which can be hard to standardize and assess the quality of. To assess the annotation practices of human labeled data in the field of machine learning, this paper investigates the datasets used in the highest cited papers in the AAAI Conference on Artificial Intelligence, an influential machine learning conference. After extracting the datasets from 75 papers in three overlapping publication periods, the top 20 datasets were evaluated from each period. The results showed that the majority of datasets do not use or underreport significant annotation practices, specifically about the annotators and the annotation process. This raises concern for the conference and the field more broadly, as the most influential papers build their machine learning algorithms on quite possibly low quality data. However, there is some hope for the field in this regard as the more recent papers use datasets with better quality annotation practices.