Sparse Temporal Convolutional Neural Networks for Keyword Spotting

None, None

Sparse Temporal Convolutional Neural Networks for Keyword Spotting

Master Thesis (2024)

Author(s)

P. Fu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. Gao – Mentor (TU Delft - Electronics)

Sijun Du – Graduation committee member (TU Delft - Electronic Instrumentation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Temporal sparsity Temporal convolutional network (TCN) Keyword spotting (KWS) Structural sparsity

To reference this document use:

https://resolver.tudelft.nl/uuid:4be34fc5-6b03-4e1f-afaa-83ee3c584809

More Info

expand_more

Publication Year

2024

Language

English

Copyright

Graduation Date

08-01-2024

Awarding Institution

Delft University of Technology

Programme

Electrical Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Keyword spotting (KWS) is an essential component of voice recognition services on smart devices. Its always-on characteristic requires high accuracy and real-time response. Also, low power consumption is another key demand for KWS devices. In previous research, neural networks have become popular for KWS tasks for their accuracy compared to traditional machine learning technologies. Among classical neural networks like recurrent neural networks (RNNs) and convolutional neural networks (CNNs), temporal convolutional networks (TCNs) have begun to catch attention recently. Moreover, studies related to sparsity are always an efficient method to deal with the growing model size issue for modern neural network designs. As a potential solution, in this work, a TCN model is trained for KWS on the Google Speech Command V2 dataset and achieves an accuracy of 94.1\%. Based on that, two different sparsity are applied to the TCN model. One is temporal sparsity. By creating a Delta convolution layer, the Delta temporal convolutional network (DeltaTCN) achieves an accuracy of 93.6\% with a 72\% reduction in floating-point operations (FLOPS) compared to the original TCN model. Another is structural weight sparsity. By creating sparsity on the weight matrix of each convolution layer, the structural sparse temporal convolutional network (SSPTCN) achieves 93.6\% accuracy with a 70\% reduction in FLOPs and a 39\% reduction in parameters.

Files

Sparse_Temporal_Convolutional_... (pdf)

(pdf | 2.91 Mb)

License info not available