The Time-Course of Phoneme Category Adaptation in Deep Neural Networks

Conference Paper (2019)
Author(s)

Junrui Ni (University of Illinois at Urbana Champaign)

Mark Hasegawa-Johnson (University of Illinois at Urbana Champaign)

Odette Scharenborg (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Multimedia Computing
DOI related publication
https://doi.org/10.1007/978-3-030-31372-2_1 Final published version
More Info
expand_more
Publication Year
2019
Language
English
Research Group
Multimedia Computing
Pages (from-to)
3-15
Publisher
Springer
ISBN (print)
978-3-030-31371-5
ISBN (electronic)
978-3-030-31372-2
Event
SLSP 2019 (2019-10-14 - 2019-10-16), Ljubljana, Slovenia
Downloads counter
228
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN’s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.

Files

License info not available