An 8.62-μW 75-dB DRSoC Fully Integrated SoC for Spoken Language Understanding

Journal Article (2025)
Author(s)

Sheng Zhou (Universitat Zurich, ETH Zürich)

Zixiao Li (Universitat Zurich, ETH Zürich)

Longbiao Cheng (Universitat Zurich, ETH Zürich)

Jerome Hadorn (Universitat Zurich, ETH Zürich)

C. Gao (TU Delft - Electronics)

Qinyu Chen (Universiteit Leiden)

Tobi Delbruck (Universitat Zurich, ETH Zürich)

Kwantae Kim (Aalto University)

Shih Chii Liu (ETH Zürich, Universitat Zurich)

Research Group
Electronics
DOI related publication
https://doi.org/10.1109/JSSC.2025.3602936
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Electronics
Bibliographical Note
Green Open Access added to TU Delft Institutional Repository as part of the Taverne amendment. More information about this copyright law amendment can be found at https://www.openaccess.nl. Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. @en
Issue number
11
Volume number
60
Pages (from-to)
4002-4017
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We present a sub-10-µW fully integrated SoC for on-device spoken language understanding (SLU). Its analog feature extractor (FEx) applies global and per-channel automatic gain control (AGC) to extend the system’s dynamic range (DR)—a critical requirement for real-world scenarios, including far-field operations. The on-chip streaming-mode recurrent neural network (RNN) accelerator exploits temporal sparsity and pooling, reducing its power by 2.3x. By combining hardware-aware training with a behavioral model of the FEx that captures circuit nonidealities, the network is trained to maintain SLU accuracy despite chip-to-chip variation. Fabricated in a 65-nm CMOS process, the SoC occupies 2.23 mm
2 and consumes 8.62 µW for end-to-end SLU. The 16-channel FEx achieves 93-dB DR while dissipating 1.85 µW at 100-Hz feature frame rate. The SoC is evaluated on the 32-class Fluent Speech Commands dataset (FSCD), achieving 92.9% accuracy for 2.8-mV
rms inputs while maintaining >85% accuracy over a 75-dB input range.

Files

License info not available
warning

File under embargo until 16-03-2026