Towards Identity Preserving Normal to Dysarthric Voice Conversion

None, None; None, None; None, None; None, None; None, None

Towards Identity Preserving Normal to Dysarthric Voice Conversion

Conference Paper (2022)

Author(s)

Wen-Chin Huang (Nagoya University)

Bence Mark Halpern (Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis, TU Delft - Multimedia Computing, Universiteit van Amsterdam)

Lester Phillip Violeta (Nagoya University)

Odette Scharenborg (TU Delft - Multimedia Computing)

Tomoki Toda (Nagoya University)

Research Group

Multimedia Computing

DOI related publication

https://doi.org/10.1109/ICASSP43922.2022.9747550

Autoencoder Voice conversion Pathological speech Dysarthric speech Sequence-to-sequence modeling

To reference this document use:

https://resolver.tudelft.nl/uuid:593cb1dc-e5ab-4a8b-9eb1-908bbc26a2e9

More Info

expand_more

Publication Year

2022

Language

English

Research Group

Multimedia Computing

Pages (from-to)

6672-6676

ISBN (print)

978-1-6654-0541-6

ISBN (electronic)

978-1-6654-0540-9

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker’s voice was limited and requires further improvements.

Files

Towards_Identity_Preserving_No... (pdf)

(pdf | 1.05 Mb)

- Embargo expired in 01-07-2023

License info not available