WK

W.B. Kleijn

info

Please Note

15 records found

Journal article (2024) - Wangyang Yu, W. Bastiaan Kleijn
Mapping a room impulse response (RIR) to its Ambisonics representation is not always feasible. However, by adding a weak assumption (i.e., the existence of at least two perpendicular walls in the environment), the Ambisonics representation is restricted to be one of a finite set, with known transformations between the set entries. This makes mapping the omnidirectional RIR to the Ambisonics RIR (ARIR) possible. The authors solve the mapping problem with a convolutional neural network and multi-task variational autoencoder. The room is assumed to be rectangular. The proposed method is based on the image source method with frequency-independent reflection coefficients exclusively. The authors focus on the early part of RIRs, where the directional information lies. This method requires only a single RIR. Generalizing to the real world, measurements can obviate the need for specialized hardware for Ambisonics measurement. The proposed method can achieve an SNR of 17.62 dB on estimated first-order ARIRs and 16.15 dB on estimated third-order ARIRs. ...
Active control of noise propagating through apertures is commonly realized with closed-loop LMS algorithms. However, these algorithms require a large number of error microphones and provide only local attenuation. Slow convergence and high computational effort are additional disadvantages. We propose a wave-domain approach that converges instantaneously, operates with low computational effort and does not require error microphones. It inherently controls sound in all directions in the far-field. The soundfield from the aperture is matched in a least squares sense with the generated soundfield from the loudspeaker array using orthonormal basis functions. Compensation for algorithmic delay, induced by blockwise processing, can be based on microphone placement or signal prediction, at the cost of a loss in attenuation performance. Our simulation results indicate that wave-domain processing has the potential to outperform LMS-based methods in practical active noise control for apertures. ...
Journal article (2021) - Wangyang Yu, W.Bastiaan Kleijn
We describe a new method to estimate the geometry of a room and reflection coefficients given room impulse responses. The method utilizes convolutional neural networks to estimate the room geometry and multilayer perceptrons to estimate the reflection coefficients. The mean square error is used as the loss function. In contrast to existing methods, we do not require the knowledge of the relative positions of sources and receivers in the room. The method can be used with only a single RIR between one source and one receiver. For simulated environments, the proposed estimation method can achieve an average of 0.04 m accuracy for each dimension in room geometry estimation and 0.09 accuracy in reflection coefficients. For real-world environments, the room geometry estimation method achieves an accuracy of an average of 0.065 m for each dimension. ...
In this paper, we present a novel derivation of an existing algorithm for distributed optimization termed the primal-dual method of multipliers (PDMM). In contrast to its initial derivation, monotone operator theory is used to connect PDMM with other first-order methods such as Douglas-Rachford splitting and the alternating direction method of multipliers, thus, providing insight into its operation. In particular, we show how PDMM combines a lifted dual form in conjunction with Peaceman-Rachford splitting to facilitate distributed optimization in undirected networks. We additionally demonstrate sufficient conditions for primal convergence for strongly convex differentiable functions and strengthen this result for strongly convex functions with Lipschitz continuous gradients by introducing a primal geometric convergence bound. ...
In this paper, we present a novel method for convex optimization in distributed networks called the distributed method of multipliers (DMM). The proposed method is based on a combination of a particular dual lifting and classic monotone operator splitting approaches to produce an algorithm with guaranteed asymptotic convergence in undirected networks. The proposed method allows any separable convex problem with linear constraints to be solved in undirected networks. In contrast to typical distributed approaches, the structure of the network does not restrict the types of problems that can be solved. Furthermore, the solver can be applied to general separable problems, those with separable convex objectives and constraints, via the use of an additional primal lifting approach. Finally, we demonstrate the use of DMM in solving a number of classic signal processing problems including beamforming, channel capacity maximization and portfolio optimization. ...
Conference paper (2018) - Wangyang Yu, W. Bastiaan Kleijn
We investigate what information about a room is necessary to integrate a new source into an existing scenario. In particular, we consider the effects of the reflection order, the order of ambisonics signals and reverberation time. We conducted a series of listening tests and used the control variates method to determine the quantitative relevance of the selected attributes. In terms of integration and accurate localisation, at least third order ambisonics description of a source, is required for integration of that source. In addition, a finite number of early reflections can perform equally well to a full room impulse response when a new source is integrated into an existing scenario. However, the room impulse response with only the correct reverberation time is not sufficient. ...
Journal article (2018) - Steven Van Kuyk, W. Bastiaan Kleijn, Richard C. Hendriks
We propose a monaural intrusive instrumental intelligibility metric called SIIB (speech intelligibility in bits). SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing information theoretic intelligibility metrics, SIIB accounts for talker variability and statistical dependencies between time-frequency units. Our evaluation shows that relative to state-of-the-art intelligibility metrics, SIIB is highly correlated with the intelligibility of speech that has been degraded by noise and processed by speech enhancement algorithms. ...
Journal article (2018) - Steven Van Kuyk, W. Bastiaan Kleijn, Richard Christian Hendriks
Instrumental intelligibility metrics are commonly used as an alternative to listening tests. This paper evaluates 12 monaural intrusive intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and sEPSMcorr. In addition, this paper investigates the ability of intelligibility metrics to generalize to new types of distortions and analyzes why the top performing metrics have high performance. The intelligibility data were obtained from 11 listening tests described in the literature. The stimuli included Dutch, Danish, and English speech that was distorted by additive noise, reverberation, competing talkers, preprocessing enhancement, and postprocessing enhancement. SIIB and HASPI had the highest performance achieving a correlation with listening test scores on average of ρ =0.92 and ρ =0.89, respectively. The high performance of SIIB may, in part, be the result of SIIBs developers having access to all the intelligibility data considered in the evaluation. The results show that intelligibility metrics tend to perform poorly on datasets that were not used during their development. By modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Additionally, this paper presents a new version of SIIB called SIIBGauss, which has similar performance to SIIB and HASPI, but takes less time to compute by two orders of magnitude. ...
Speech intelligibility enhancement is considered for multiple-microphone acquisition and single loudspeaker rendering. This is based on the mutual information measured between the message spoken at far-end environment and the message perceived by a listener at near-end. We prove that the joint optimal processing can be decomposed into far-end and near-end processing. The former is a minimum variance distortionless response beamformer that reduces the noise in the talker environment and the latter is a post-filter that redistributes the power over the frequency bands. Disjoint processing is optimal provided that the post-filtering operation is aware of the residual noise from the beamforming operation. Our results show that both processing steps are necessary for the effective conveyance of a message and, importantly, that the second step must be aware of the remaining noise from the beamforming operation in the first step. In addition, we study the use of the mutual information applied on the perceptually more relevant powers per critical band. ...
Conference paper (2017) - Steven Van Kuyk, W. Bastiaan Kleijn, Richard C. Hendriks
The key to the success of speech-based technology is an understanding of human speech communication. While significant advances have been made, a unified theory of speech communication that is both comprehensive and quantitative is yet to emerge. In this paper we approach speech communication from an information theoretical perspective. Without relying on prior knowledge of speech production, language, or auditory processing, we develop a new methodology for measuring the information rate of speech. Instead we rely on having recordings of multiple talkers saying the same utterance. In general, our results are consistent with a linguistic understanding of speech communication. ...
Conference paper (2017) - Christos Tzagkarakis, W. Bastiaan Kleijn, Jan Skoglund
This paper addresses the problem of joint wideband localization and acquisition of acoustic sources. The source locations as well as acquisition of the original source signals are obtained in a joint fashion by solving a sparse recovery problem. Spatial sparsity is enforced by discretizing the acoustic scene into a grid of predefined dimensions. In practice, energy leakage from the source location to the neighboring grid points is expected to produce spurious location estimates, since the source location will not coincide with one of the grid points. To alleviate this problem we introduce the concept of grid-shift. A particular source is then near a point on the grid in at least one of a set of shifted grids. For the selected grid, other sources will generally not be on a grid point, but their energy is distributed over many points. A large number of experiments on real speech signals show the localization and acquisition effectiveness of the proposed approach under clean, noisy and reverberant conditions. ...
The processing required for the global maximization of the intelligibility of speech acquired by multiple microphones and rendered by a single loudspeaker, is considered in this paper. The intelligibility is quantized, based on the mutual information rate between the message spoken by the talker and the message as interpreted by the listener. We prove that then, in each of a set of narrow-band channels, the processing can be decomposed into a minimum variance distortionless response (MVDR) beamforming operation that reduces the noise in the talker environment, followed by a gain operation that, given the far-end noise and beamforming operation, accounts for the noise at the listener end. Our experiments confirm that both processing steps are necessary for the effective conveyance ofa message and, importantly, that the second step must be aware of the first step. ...
In this paper we propose a distributed reformulation of the linearly constrained minimum variance (LCMV) beamformer for use in acoustic wireless sensor networks. The proposed distributed minimum variance (DMV) algorithm, for which we demonstrate implementations for both cyclic and acyclic networks, allows the optimal beamformer output to be computed at each node without the need for sharing raw data within the network. By exploiting the low rank structure of estimated covariance matrices in time-varying noise fields, the algorithm can also provide a reduction in the total amount of data transmitted during computation when compared to centralised solutions. This is particularly true when multiple microphones are used per node. We also compare the performance of DMV with state of the art distributed beamformers and demonstrate that it achieves greater improvements in SNR in dynamic noise fields with similar transmission costs. ...
Conference paper (2016) - T. Sherson, R. Heusdens, W.B. Kleijn
In this paper, we focus on the challenge of processing data generated within decentralised wireless sensor networks in a distributed manner. When the desired operations can be expressed as globally constrained separable convex optimisation problems, we show how we can convert these to extended monotropic programs and exploit Lagrangian duality to form equivalent distributed consensus problems. Such problems can be embedded in sensor network applications via existing solvers such as the alternating direction method of multipliers or the primal dual method of multipliers. We then demonstrate how this approach can be used to solve specific problems including linearly constrained quadratic problems and the classic Gaussian channel capacity maximisation problem in a distributed manner. ...
Conference paper (2016) - Steven van Kuyk, W. Bastiaan Kleijn, Richard C. Hendriks
Instrumental measures of speech intelligibility typically produce an index between 0 and 1 that is monotonically related to listening test scores. As such, these measures are dimensionless and do not represent physical quantities. In this paper, we propose a new instrumental intelligibility metric that describes speech intelligibility using bits per second. The proposed metric builds upon an existing intelligibility metric that was motivated by information theory. Our main contribution is that we use a statistical model of speech communication that accounts for noise inherent in the speech production process. Experiments show that the proposed metric performs at least as well as existing state-of-the-art intelligibility metrics.
...