Leveraging spatial cues from cochlear implant microphones to efficiently enhance speech separation in naturalistic listening scenes

None, None; None, None; None, None; None, None; None, None; None, None

Leveraging spatial cues from cochlear implant microphones to efficiently enhance speech separation in naturalistic listening scenes

Journal Article (2026)

Author(s)

Feyisayo Olalere (Radboud Universiteit Nijmegen)

Kiki van der Heijden (Columbia University, Radboud Universiteit Nijmegen)

H. Christiaan Stronks (Leiden University Medical Center)

Jeroen Briaire (Leiden University Medical Center)

Johan H.M. Frijns (Universiteit Leiden, TU Delft - Electrical Engineering, Mathematics and Computer Science, Leiden University Medical Center)

Marcel van Gerven (Radboud Universiteit Nijmegen)

Research Group

Bio-Electronics

DOI related publication

https://doi.org/10.1038/s41598-025-31999-8 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:694b2154-6244-4ab6-a621-50deabd27653

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Bio-Electronics

Journal title

Scientific Reports

Issue number

1

Volume number

16

Article number

2255

Downloads counter

31

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Despite the success of speech separation approaches for dry (non-reverb) speech mixtures, speech separation in naturalistic, spatial, and reverberant acoustic environments remains challenging. This limits the effectiveness of current speech separation methods for assistive hearing devices as well as neuroprosthetic devices such as cochlear implants (CIs). Here, we investigate whether a deep neural network model for speech separation can utilize the spatial information in naturalistic listening scenes as captured by a CI’s microphones to improve separation performance. We examined the impact of latent spatial cues (inherently present in two-channel speech mixtures, but need to be learned from these mixtures), as well as pre-computed spatial cues added to the speech mixtures as auxiliary input features (inter-channel level and phase differences, ILDs and IPDs). Specifically, we introduce a two-channel version of the SuDoRM-RF speech separation model, which takes as input speech mixtures recorded with two CI microphones and shows that latent spatial cues enhance separation performance without affecting model efficiency in terms of model complexity and inference latency. Pre-computed spatial cues – especially IPDs – enhanced separation performance even more, but simultaneously reduced model efficiency. Finally, simulating a CI user’s listening experience with a vocoder showed that the beneficial effect of spatial cues on DNN speech separation persists even if the separated speech streams are spectrotemporally degraded as in the output of a CI.

Files

S41598-025-31999-8.pdf

(pdf | 2.26 Mb)