AD

A. Despan

info

Please Note

1 records found

Bachelor thesis (2025) - A. Despan, A.M. Demetriou, C.C.S. Liem, J. Yang

High-impact vision research still rests on datasets whose labels arrive via opaque, rarely documented pipelines. To understand how serious the problem is inside a large venue, we audited 75 TPAMI papers (2009-2024) that rely or introduce datasets. Each dataset was coded against a 27-item checklist adapted from Garbage in, Garbage out, spanning annotator recruitment, training, compensation, overlap-resolution and more. Across the corpus, 37% of the expected annotation metadata is missing; the rate changes little between recent (2022-24) and older cohorts. The scarcest fields are labeller-population rationale (76.6% absent), prescreening criteria (73.4%), total annotators (68.8%), compensation (67.2%) and training procedures (62.5%). Documentation quality shows virtually no correlation with a paper’s citation impact, suggesting community prestige does not buy transparency. A handful of well—curated datasets achieve >75% completeness, proving that thorough documentation is possible when incentives align. The median TPAMI benchmark still ships with an unverifiable "ground truth", threatening the reproducibility and fairness claims of downstream models. We advocate that journals and conferences require a concise, checklist-based annotation statement, mirroring existing ethics and reproducibility forms, to ensure future vision systems are built (and evaluated) on transparent, trustworthy data foundations. ...