Leveraging spatial cues from cochlear implant microphones to efficiently enhance speech separation in naturalistic listening scenes

Journal Article (2026)
Author(s)

Feyisayo Olalere (Radboud Universiteit Nijmegen)

Kiki van der Heijden (Columbia University, Radboud Universiteit Nijmegen)

H. Christiaan Stronks (Leiden University Medical Center)

Jeroen Briaire (Leiden University Medical Center)

Johan H.M. Frijns (Universiteit Leiden, TU Delft - Electrical Engineering, Mathematics and Computer Science, Leiden University Medical Center)

Marcel van Gerven (Radboud Universiteit Nijmegen)

Research Group
Bio-Electronics
DOI related publication
https://doi.org/10.1038/s41598-025-31999-8 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Bio-Electronics
Journal title
Scientific Reports
Issue number
1
Volume number
16
Article number
2255
Downloads counter
31
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Despite the success of speech separation approaches for dry (non-reverb) speech mixtures, speech separation in naturalistic, spatial, and reverberant acoustic environments remains challenging. This limits the effectiveness of current speech separation methods for assistive hearing devices as well as neuroprosthetic devices such as cochlear implants (CIs). Here, we investigate whether a deep neural network model for speech separation can utilize the spatial information in naturalistic listening scenes as captured by a CI’s microphones to improve separation performance. We examined the impact of latent spatial cues (inherently present in two-channel speech mixtures, but need to be learned from these mixtures), as well as pre-computed spatial cues added to the speech mixtures as auxiliary input features (inter-channel level and phase differences, ILDs and IPDs). Specifically, we introduce a two-channel version of the SuDoRM-RF speech separation model, which takes as input speech mixtures recorded with two CI microphones and shows that latent spatial cues enhance separation performance without affecting model efficiency in terms of model complexity and inference latency. Pre-computed spatial cues – especially IPDs – enhanced separation performance even more, but simultaneously reduced model efficiency. Finally, simulating a CI user’s listening experience with a vocoder showed that the beneficial effect of spatial cues on DNN speech separation persists even if the separated speech streams are spectrotemporally degraded as in the output of a CI.