Automatic evaluation of spontaneous oral cancer speech using ratings from naive listeners
Bence Mark Halpern (TU Delft - Multimedia Computing, Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis, Universiteit van Amsterdam)
Siyuan Feng (TU Delft - Multimedia Computing)
Rob J.J.H. van Son (Universiteit van Amsterdam, Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis)
Michiel W.M. van den Brekel (Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis, Universiteit van Amsterdam)
O.E. (Odette) Scharenborg (TU Delft - Multimedia Computing)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
In this paper, we build and compare multiple speech systems for the automatic evaluation of the severity of a speech impairment due to oral cancer, based on spontaneous speech. To be able to build and evaluate such systems, we collected a new spontaneous oral cancer speech corpus from YouTube consisting of 124 utterances rated by 100 non-expert listeners and one trained speech-language pathologist, which we made publicly available. We evaluated the systems in two scenarios: a scenario where transcriptions were available (reference-based) and a scenario where transcriptions might not be available (reference-free). The results of extensive experiments showed that (1) when transcriptions were available, the highest correlation with the human severity ratings was obtained using an automatic speech recognition (ASR) retrained with oral cancer speech. (2) When transcriptions were not available, the best results were achieved by a LASSO model using modulation spectrum features. (3) We found that naive listeners’ ratings are highly similar to the speech pathologist's ratings for speech severity evaluation. (4) The use of binary labels led to lower correlations of the automatic methods with the human ratings than using severity scores.