Behind the Labels: Transparency Pitfalls in Annotation Practices for Societally Impactful ML

None, None

Behind the Labels: Transparency Pitfalls in Annotation Practices for Societally Impactful ML

A deep dive into annotation transparency and consistency in CVPR corpus

Bachelor Thesis (2025)

Author(s)

C. Scorţia (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A.M. Demetriou – Mentor (TU Delft - Multimedia Computing)

Cynthia C. S. Liem – Mentor (TU Delft - Multimedia Computing)

J. Yang – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Pattern Recognition Computer Vision Annotation Practices

To reference this document use:

https://resolver.tudelft.nl/uuid:300c89d8-5826-46b8-92d5-dff3e0384a60

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

24-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This study investigates annotation and reporting practices in machine learning (ML) research, focusing on societally impactful applications presented at the IEEE/CVF Computer Vision and Pattern Recognition (CVPR) conferences. By structurally analyzing the 75 most-cited CVPR papers from the past 2, 5, and 15 years, we evaluate how the human annotations foundation of supervised ML is documented. We introduce a 27-field annotation-reporting schema and apply it to 60 datasets, revealing that nearly 30% of relevant information is routinely omitted. Key findings include the pervasive underreporting of annotator details such as training, prescreening, and inter-rater reliability (IRR) metrics. While popular datasets like COCO and ImageNet exhibit widespread use, transparency about annotation methodologies remains inconsistent. The impact of a few fields shows that basic metadata, such as the selection process of annotators and how the labels' overlap is managed, strongly anticipate overall documentation quality. Our findings support previous calls for standardization and underscore the need for institutionalized reporting practices to ensure reproducibility, fairness, and trust in ML systems.

Files

Behind_the_Labels.pdf

(pdf | 4.18 Mb)

License info not available