A Quest through Interconnected Datasets: Research on Annotation Practices in Highly Cited Audio Machine Learning Work and Their Utilized Datasets

Annotation Practices in Datasets Utilized by The International Conference on Acoustics, Speech, and Signal Processing (ICASSP) Conferences: A Transparency Analysis

Bachelor Thesis (2023)
Authors

D. Taşcılar (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Doğa Taşcılar
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Doğa Taşcılar
Graduation Date
28-06-2023
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research examines transparency between ICASSP conference papers and the dataset documentations related to the datasets' annotation practices. Top-cited 5 papers and 51 unique resources in total were considered. All of the selected papers utilized at least one dataset. For every paper, an extensive metadata search has done to reach the initial datasource of those datasets. These searches happened both within the paper contents such as sections and references along with outside the paper contents through the way of extensive web queries. Analysis of the papers published from 2021 and 2022 and their relevant datasets revealed varying levels of transparency. Original dataset creators provide comprehensive information, while papers using modified datasets offer limited details on initial annotations. Emphasizing the need for accountability, this study suggests that papers utilizing datasets should trace back to the initial dataset and provide explicit comments. The findings underscore the importance of ensuring sufficient information in initial datasets and promoting transparency and traceability in dataset annotation practices within the ICASSP community.

Files

RP_Final_Doga.pdf
(pdf | 2.09 Mb)
License info not available