Speech Production Modelling and Analysis

More Info
expand_more

Abstract

The first part of the present thesis reviews the speech production mechanism and several models of the glottal flow derivative waveform and of the vocal tract filter. The source filter model is investigated in depth, since it is the most important "ingredient" of linear prediction analysis. We also review seven linear prediction (LP) methods based on the same general LP optimization framework. Moreover, we examine the importance of pre-emphasis and glottal-cancellation prior to LP. The second part of the thesis, provides an experimental evaluation of the LP methods combined with several pre-emphasis and glottal-cancellation techniques in the context of two general application areas. The first area consists of applications which aim to estimate the true glottal flow or glottal flow derivative signal. The second area consists of applications which aim to find a sparse residual. In particular, five factors are investigated: the sparsity of the residual using the Gini index, the estimation accuracy of the glottal flow derivative using the signal to noise ratio (SNR), the estimation accuracy of the vocal tract spectral magnitude using the log spectral distortion distance (LSD) metric, and the probability of obtaining a stable linear prediction filter. All these factors are evaluated for clean and reverberated speech signals. The sparse linear prediction methods and the iteratively reweighted least squares method combined with the second order pre-emphasis filter give the most accurate glottal flow derivative estimates, the most accurate vocal tract estimates and the sparsest residuals in most cases. Finally, we compare several linear prediction methods in the context of the speech dereverberation method proposed in [1, 2]. This method enhances the reverberated residual obtained via the autocorrelation method. In the context of this application, we show that the sparse linear prediction method and the weighted linear prediction method combined with a second-order pre-emphasis filter perform better than the autocorrelation method.

Files