Whisper-ATC

None, None; None, None; None, None; None, None; None, None

Whisper-ATC

Open Models for Air Traffic Control Automatic Speech Recognition with Accuracy

Conference Paper (2024)

Author(s)

Jan van Doorn (Student TU Delft)

Junzi Sun (TU Delft - Control & Simulation)

J.M. Hoekstra (TU Delft - Air Transport & Operations)

Patrick Jonk (Royal Netherlands Aerospace Centre)

Vincent de Vries (Royal Netherlands Aerospace Centre)

Research Group

Control & Simulation

Machine learning Automatic speech recognition Air traffic control Whisper

To reference this document use:

https://resolver.tudelft.nl/uuid:8e02d222-5775-441d-94d2-96c26156cf43

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Control & Simulation

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Current advancements in machine learning have provided new architectures, such as encoder-decoder transformers, for automatic speech recognition. For generic speech recognition, very high accuracies are already achievable. However, in air traffic control, automatic speech recognition models traditionally rely on domain-specific models constructed from limited training data. This study introduces this newly developed transformer model for air traffic control and provides a set of fully open automatic speech recognition models with high accuracies. This paper demonstrates how a large-scale, weakly supervised automatic speech recognition model, Whisper, is fine-tuned with various air traffic control datasets to improve model performance. We also evaluated the performance of different sizes of Whisper models. In the end, it was possible to achieve word error rates of 13.5% on the ATCO2 dataset and 1.17% on the ATCOSIM dataset with a random split (or 3.88% with speaker split). The study also reveals that finetuning with region-specific data can enhance performance by up to 60% in real-world scenarios. Finally, we have open-sourced the code base and the models for future research.

Files

ICRAT2024_paper_83.pdf

(pdf | 0.349 Mb)

License info not available