Challenges and practical guidelines for atypical speech data collection, annotation, usage and sharing

A multi-project perspective

Conference Paper (2025)
Author(s)

Z. Yue (TU Delft - Multimedia Computing)

Mara Barberis (Katholieke Universiteit Leuven)

T.B. Patel (TU Delft - Multimedia Computing, Erasmus MC)

Judith Dineley (King’s College London)

Willemijn Doedens (Koninklijke Auris Groep)

Lottie Stipdonk (Erasmus MC)

Y. Zhang (TU Delft - Multimedia Computing)

Elke De Witte (Erasmus MC)

O.E. Scharenborg (TU Delft - Multimedia Computing)

More Authors

Research Group
Multimedia Computing
DOI related publication
https://doi.org/10.21437/Interspeech.2025-2774
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Multimedia Computing
Pages (from-to)
3943-3947
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Speech technologies have advanced significantly, yet they remain largely trained on typical speech, limiting their applicability to individuals with speech and language impairments. A key obstacle is the lack of well-annotated and representative atypical speech corpora. This paper conducts a multi-project survey and shares the first-hand experience on the challenges of collecting, annotating, using, and sharing atypical speech data. Experiences from seven research projects on collecting atypical speech data, involving both academic and clinical perspectives, are reported and potential issues are discussed. Furthermore, the paper provides practical guidelines that allow for standardisation and harmonisation of data collection practices, which are crucial to allow studies to be compared, replicated, and validated, which is essential for developing more inclusive and effective speech technologies.

Files

Yue25_interspeech.pdf
(pdf | 0.493 Mb)
License info not available