Applying Large-Scale Weakly Supervised Automatic Speech Recognition to Air Traffic Control

More Info
expand_more

Abstract

The application of automatic speech recognition in the air traffic control domain has been researched extensively. However, its primary application remains in the training and simulation of air traffic controllers. This is due to the insufficient performance of automatic speech recognition in specific environments, such as air traffic control, where strong performance and safety requirements are paramount. This study demonstrates how a large-scale, weakly supervised automatic speech recognition model, Whisper, could meet these performance requirements and establish a new approach to air traffic control communication. Fine-tuning Whisper in the air traffic control domain resulted in a word error rate of 13.5% on the ATCO2 dataset and 1.17% on the ATCOSIM dataset. Furthermore, the study reveals that fine-tuning with region-specific data can enhance performance by up to 60% in real-world scenarios.

Files