Auditory Kernels for Representing Degraded Speech
Auditory Kernels in an Efficient Representation of Degraded Speech
B. Karslıoğlu (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Dimme de Groot – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Jorge Abraham Martinez Castaneda – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
We explore the use of biologically inspired auditory kernels—learned from sparse coding on (clean) read speech—to analyze and reconstruct signals degraded with additive noise. Auditory kernels mimic spectrotemporal filters in the human auditory system, offering insight into how structured acoustic signals can be internally represented and selectively preserved. Our study applies an auditory kernel-based matching pursuit reconstruction framework to clean, degraded, and standalone noise audio, investigating kernel activation patterns across input types. The findings reveal kernel selectivity; structured signals like speech activate a common subset of kernels, while unstructured noise elicits distinct, less overlapping activations, allowing for more effective separation and implicit denoising. This selectivity results in implicit denoising, preserving intelligibility and perceptual quality even under degradation. By quantifying this behavior across noise types and SNR levels, we show that auditory kernels not only support robust signal reconstruction but also offer a biologically grounded, explainable mechanism for speech enhancement. These insights advance the use of sparse auditory models in both neuroscience and signal processing, motivating future work on adaptive or context-aware dictionaries.