Applying Large-Scale Weakly Supervised Automatic Speech Recognition to Air Traffic Control
J.L.P.M. van Doorn (TU Delft - Aerospace Engineering)
Junzi Sun – Mentor (TU Delft - Control & Simulation)
J.M. Hoekstra – Graduation committee member (TU Delft - Control & Simulation)
Patrick Jonk – Mentor (Royal Netherlands Aerospace Centre NLR)
Vincent de Vries – Graduation committee member (Royal Netherlands Aerospace Centre NLR)
More Info
expand_more
Whisper Large V2 - ATCO2
https://www.doi.org/10.57967/hf/1376Whisper Large V2 - ATCOSIM
https://www.doi.org/10.57967/hf/1374Whisper Large V2 - ATCO2 - ATCOSIM
https://www.doi.org/10.57967/hf/1375Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The application of automatic speech recognition in the air traffic control domain has been researched extensively. However, its primary application remains in the training and simulation of air traffic controllers. This is due to the insufficient performance of automatic speech recognition in specific environments, such as air traffic control, where strong performance and safety requirements are paramount. This study demonstrates how a large-scale, weakly supervised automatic speech recognition model, Whisper, could meet these performance requirements and establish a new approach to air traffic control communication. Fine-tuning Whisper in the air traffic control domain resulted in a word error rate of 13.5% on the ATCO2 dataset and 1.17% on the ATCOSIM dataset. Furthermore, the study reveals that fine-tuning with region-specific data can enhance performance by up to 60% in real-world scenarios.