A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech

None, None; None, None; None, None; None, None

A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech

Journal Article (2015)

Author(s)

A. Koutrouvelis (TU Delft - Signal Processing Systems)

GP Kafentzis (University of Crete)

N.D. Gaubitch (TU Delft - Signal Processing Systems)

R. Heusdens (TU Delft - Signal Processing Systems)

Research Group

Signal Processing Systems

Copyright

DOI related publication

https://doi.org/10.1109/TASLP.2015.2506263

Speech analysis Glottal closure instants (GCIs) Voiced/unvoiced detection (VUD) Glottal opening instants (GOIs) Pitch estimation

To reference this document use:

https://resolver.tudelft.nl/uuid:ecf79431-f9f4-48da-a9c6-93ed95a253db

More Info

expand_more

Publication Year

2015

Language

English

Copyright

Research Group

Signal Processing Systems

Bibliographical Note

Accepted Author Manuscript@en

Issue number

2

Volume number

24

Pages (from-to)

316-328

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We propose a fast speech analysis method which simultaneously performs high-resolution voiced/unvoiced detection (VUD) and accurate estimation of glottal closure and glottal opening instants (GCIs and GOIs, respectively). The proposed algorithm exploits the structure of the glottal flow derivative in order to estimate GCIs and GOIs only in voiced speech using simple time-domain criteria. We compare our method with well-known GCI/GOI methods, namely, the dynamic programming projected phase-slope algorithm (DYPSA), the yet another GCI/GOI algorithm (YAGA) and the speech event detection using the residual excitation and a mean-based signal (SEDREAMS). Furthermore, we examine the performance of the aforementioned methods when combined with state-of-the-art VUD algorithms, namely, the robust algorithm for pitch tracking (RAPT) and the summation of residual harmonics (SRH). Experiments conducted on the APLAWD and SAM databases show that the proposed algorithm outperforms the state-of-the-art combinations of VUD and GCI/GOI algorithms with respect to almost all evaluation criteria for clean speech. Experiments on speech contaminated with several noise types (white Gaussian, babble, and car-interior) are also presented and discussed. The proposed algorithm outperforms the state-of-the-art combinations in most evaluation criteria for signal-to-noise ratio greater than 10 dB.

Files

3792553.pdf

(pdf | 1.44 Mb)

License info not available