The output of machine learning (ML) models can be only as good as the data that is fed into them. Because of this, when making datasets for creating ML models, it is important to ensure the quality of the data. This is especially true of human labeled data, which can be hard to s
...
The output of machine learning (ML) models can be only as good as the data that is fed into them. Because of this, when making datasets for creating ML models, it is important to ensure the quality of the data. This is especially true of human labeled data, which can be hard to standardize and assess the quality of. To assess the annotation practices of human labeled data in the field of machine learning, this paper investigates the datasets used in the highest cited papers in the AAAI Conference on Artificial Intelligence, an influential machine learning conference. After extracting the datasets from 75 papers in three overlapping publication periods, the top 20 datasets were evaluated from each period. The results showed that the majority of datasets do not use or underreport significant annotation practices, specifically about the annotators and the annotation process. This raises concern for the conference and the field more broadly, as the most influential papers build their machine learning algorithms on quite possibly low quality data. However, there is some hope for the field in this regard as the more recent papers use datasets with better quality annotation practices.