Automatic evaluation of spontaneous oral cancer speech using ratings from naive listeners

None, None; None, None; None, None; None, None; None, None

Automatic evaluation of spontaneous oral cancer speech using ratings from naive listeners

Journal Article (2023)

Author(s)

Bence Mark Halpern (Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis, TU Delft - Electrical Engineering, Mathematics and Computer Science, Universiteit van Amsterdam)

Siyuan Feng (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Rob van Son (Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis, Universiteit van Amsterdam)

Michiel van den Brekel (Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis, Universiteit van Amsterdam)

Odette Scharenborg (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Multimedia Computing

Oral cancer Pathological speech Automatic speech evaluation

DOI related publication

https://doi.org/10.1016/j.specom.2023.03.008 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:cccdbfd5-97fd-43bd-8569-d841a700a75c

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Multimedia Computing

Journal title

Speech Communication

Volume number

149

Pages (from-to)

84-97

Downloads counter

397

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this paper, we build and compare multiple speech systems for the automatic evaluation of the severity of a speech impairment due to oral cancer, based on spontaneous speech. To be able to build and evaluate such systems, we collected a new spontaneous oral cancer speech corpus from YouTube consisting of 124 utterances rated by 100 non-expert listeners and one trained speech-language pathologist, which we made publicly available. We evaluated the systems in two scenarios: a scenario where transcriptions were available (reference-based) and a scenario where transcriptions might not be available (reference-free). The results of extensive experiments showed that (1) when transcriptions were available, the highest correlation with the human severity ratings was obtained using an automatic speech recognition (ASR) retrained with oral cancer speech. (2) When transcriptions were not available, the best results were achieved by a LASSO model using modulation spectrum features. (3) We found that naive listeners’ ratings are highly similar to the speech pathologist's ratings for speech severity evaluation. (4) The use of binary labels led to lower correlations of the automatic methods with the human ratings than using severity scores.

Files

1_s2.0_S016763932300047X_main.... (pdf)

(pdf | 1.19 Mb)