Repository hosted by TU Delft Library

Home · Contact · About · Disclaimer ·

Phoneme-group specific octave-band weights in predicting speech intelligibility

Publication files not online:

Author: Steeneken, H.J.M. · Houtgast, T.
Institution: TNO Technische Menskunde
Source:Speech communication, 3-4, 38, 399 - 411
Identifier: 11513
doi: doi:10.1016/S0167-6393(02)00011-0
Keywords: Acoustics and Audiology · Diagnostic prediction · Frequency-importance function · Objective measurement · Octave-band contributions · Phoneme groups · Speech intelligibility · Speech transmission index · Bandwidth · Mathematical models · Signal to noise ratio · Transfer functions · Speech transmission index · STI · Speech communication


In an earlier study we derived robust frequency-weighting functions for prediction of the intelligibility of short nonsense words.These frequency-weighting functions are applied for prediction of intelligibility such as with the speech transmission index (STI).Six independent experiments revealed essentially similar frequency-weighting functions for the prediction of the nonsense word scores with respect to signal-to-noise ratio and gender [Speech Communication 28 (1999)109 ].Although the frequency weightings do not vary significantly for signal-to-noise ratio or gender, other studies have shown that using different types of speech material (i.e., nonsense words, phonetically balanced words and connected discourse) resulted in quite different frequency-weighting functions. This may be related to the distribution of specific phonemes in the test material. In order to obtain a more generic description of the frequency weighting, four relevant groups of phonemes were identified. In situations with reduced intelligibility, a small confusion rate of the phonemes within each group was observed.For each group a specific frequency-weighting function and a good prediction of the phoneme group scores could be obtained.It was shown that from these (weighted)phoneme group scores,word scores could be predicted with a prediction accuracy of ca.4%(this corresponds to a signal-to-noise ratio of about 1 dB).Hence,this method provides a more generic way to predict intelligibility scores for different types of speech material. (E)