The Time-Course of Phoneme Category Adaptation in Deep Neural Networks

Conference Paper (2019)
Author(s)

Junrui Ni (University of Illinois at Urbana Champaign)

Mark Hasegawa-Johnson (University of Illinois at Urbana Champaign)

O.E. Scharenborg (TU Delft - Multimedia Computing)

Research Group
Multimedia Computing
Copyright
© 2019 Junrui Ni, Mark Hasegawa-Johnson, O.E. Scharenborg
DOI related publication
https://doi.org/10.1007/978-3-030-31372-2_1
More Info
expand_more
Publication Year
2019
Language
English
Copyright
© 2019 Junrui Ni, Mark Hasegawa-Johnson, O.E. Scharenborg
Research Group
Multimedia Computing
Pages (from-to)
3-15
ISBN (print)
978-3-030-31371-5
ISBN (electronic)
978-3-030-31372-2
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN’s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.

Files

License info not available