Blind Reverberation Time Estimation using A Convolutional Neural Network with Encoder

Bachelor Thesis (2024)
Author(s)

X. Han (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jorge Martinez – Mentor (TU Delft - Multimedia Computing)

Dimme de Groot – Mentor (TU Delft - Multimedia Computing)

Maria Soledad Pera – Graduation committee member (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
27-06-2024
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Estimating reverberation time (RT60) accurately is crucial for enhancing the acoustic quality of various environments as it decides how you feel the sound fades away subjectively. Traditional methods, such as Sabine's equation, require extensive prior knowledge and assume ideal conditions, limiting their practicality. To address these limitations, this paper explores the application of convolutional neural networks (CNNs) enhanced with an encoder architecture based on transformer mechanisms for blind RT60 estimation. The proposed model leverages simulated and real-world datasets, incorporating environmental noise to improve robustness. Results indicate that the CNN-Encoder model achieves superior performance, with a mean squared error (MSE) as low as 0.0006 seconds for pure room impulse responses (RIRs) and 0.0011 seconds under +30dB signal-to-noise ratio (SNR) conditions. It also demonstrates potential in practical usage achieving an MSE of 0.0282 seconds under audio recordings. This approach offers a significant reduction in estimation error compared to the CNN-only architecture, demonstrating the potential for improved acoustic parameter estimation in varied environments. Future work will focus on further optimizing the model for real-world applications and reducing computational complexity while maintaining high accuracy.

Files

Thesis_final_version.pdf
(pdf | 0.356 Mb)
License info not available