The representation of speech in deep neural networks

None, None; None, None; None, None; None, None

The representation of speech in deep neural networks

Conference Paper (2019)

Author(s)

O.E. Scharenborg (TU Delft - Multimedia Computing, Radboud Universiteit Nijmegen)

Nikki van der Gouw (Radboud Universiteit Nijmegen)

M. Larson (TU Delft - Multimedia Computing, Radboud Universiteit Nijmegen)

Elena Marchiori (Radboud Universiteit Nijmegen)

Multimedia Computing

Copyright

DOI related publication

https://doi.org/10.1007/978-3-030-05716-9_16

Deep neural networks Speech representations Visualizations

To reference this document use:

https://resolver.tudelft.nl/uuid:4e628702-20fc-4131-8389-e872e131a32c

More Info

expand_more

Publication Year

2019

Language

English

Copyright

Multimedia Computing

Bibliographical Note

Accepted author manuscript@en

Pages (from-to)

194-205

ISBN (print)

978-303005715-2

ISBN (electronic)

978-3-030-05716-9

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naïve, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN were visualized. The visualizations allow us to study the distance between the representations of different types of input frames and observe the clustering structures formed by these representations. In the different visualizations, the input frames were labeled with different linguistic categories: sounds in the same phoneme class, sounds with the same manner of articulation, and sounds with the same place of articulation. We investigate whether the DNN clusters speech representations in a way that corresponds to these linguistic categories and observe evidence that the DNN does indeed appear to learn structures that humans use to understand speech without being explicitly trained to do so.

Files

Mmm2019_final.pdf

(pdf | 0.844 Mb)

- Embargo expired in 11-12-2019

License info not available