Auditory Kernels for Representing Degraded Speech

Auditory Kernels in an Efficient Representation of Degraded Speech

Bachelor Thesis (2025)
Author(s)

B. Karslıoğlu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Dimme de Groot – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Jorge Abraham Martinez Castaneda – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
25-06-2025
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
131
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We explore the use of biologically inspired auditory kernels—learned from sparse coding on (clean) read speech—to analyze and reconstruct signals degraded with additive noise. Auditory kernels mimic spectrotemporal filters in the human auditory system, offering insight into how structured acoustic signals can be internally represented and selectively preserved. Our study applies an auditory kernel-based matching pursuit reconstruction framework to clean, degraded, and standalone noise audio, investigating kernel activation patterns across input types. The findings reveal kernel selectivity; structured signals like speech activate a common subset of kernels, while unstructured noise elicits distinct, less overlapping activations, allowing for more effective separation and implicit denoising. This selectivity results in implicit denoising, preserving intelligibility and perceptual quality even under degradation. By quantifying this behavior across noise types and SNR levels, we show that auditory kernels not only support robust signal reconstruction but also offer a biologically grounded, explainable mechanism for speech enhancement. These insights advance the use of sparse auditory models in both neuroscience and signal processing, motivating future work on adaptive or context-aware dictionaries.

Files

License info not available