A Quest through Interconnected Datasets: Research on Annotation Practices in Highly Cited Audio Machine Learning Work and Their Utilized Datasets

Annotation Practices in Datasets Utilized by The International Conference on Acoustics, Speech, and Signal Processing (ICASSP) Conferences: A Transparency Analysis

More Info
expand_more

Abstract

This research examines transparency between ICASSP conference papers and the dataset documentations related to the datasets' annotation practices. Top-cited 5 papers and 51 unique resources in total were considered. All of the selected papers utilized at least one dataset. For every paper, an extensive metadata search has done to reach the initial datasource of those datasets. These searches happened both within the paper contents such as sections and references along with outside the paper contents through the way of extensive web queries. Analysis of the papers published from 2021 and 2022 and their relevant datasets revealed varying levels of transparency. Original dataset creators provide comprehensive information, while papers using modified datasets offer limited details on initial annotations. Emphasizing the need for accountability, this study suggests that papers utilizing datasets should trace back to the initial dataset and provide explicit comments. The findings underscore the importance of ensuring sufficient information in initial datasets and promoting transparency and traceability in dataset annotation practices within the ICASSP community.

Files