Speech technology for unwritten languages

None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None

Speech technology for unwritten languages

Journal Article (2020)

Author(s)

Odette Scharenborg (Radboud Universiteit Nijmegen, TU Delft - Multimedia Computing)

Laurent Besacier (LIG)

Alan W. Black (Carnegie Mellon University)

Mark Hasegawa-Johnson (University of Illinois at Urbana Champaign)

Florian Metze (Carnegie Mellon University)

Graham Neubig (Carnegie Mellon University)

Sebastian Stueker (Karlsruhe Institut für Technologie)

Pierre Godard (LIMSI, ele-de-France)

M Mueller (Karlsruhe Institut für Technologie)

undefined More Authors (External organisation)

Research Group

Multimedia Computing

Speech processing Automatic speech recognition Unsupervised learning Image retrieval Speech synthesis

DOI related publication

https://doi.org/10.1109/TASLP.2020.2973896 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:8eb3b72d-aff1-4e83-aef1-f7568b49426d

More Info

expand_more

Publication Year

2020

Language

English

Research Group

Multimedia Computing

Journal title

IEEE/ACM Transactions on Audio Speech and Language Processing

Volume number

28

Article number

8998182

Pages (from-to)

964-975

Downloads counter

368

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speech-to-text and text-to-speech subsystems. The research presented in this article takes the first steps towards speech technology for unwritten languages. Specifically, the aim of this work was 1) to learn speech-to-meaning representations without using text as an intermediate representation, and 2) to test the sufficiency of the learned representations to regenerate speech or translated text, or to retrieve images that depict the meaning of an utterance in an unwritten language. The results suggest that building systems that go directly from speech-to-meaning and from meaning-to-speech, bypassing the need for text, is possible.

Files

08998182.pdf

(pdf | 3.04 Mb)

License info not available