Structured Command Extraction from ATC Communications Using Open and Fine-Tuned Language Models

Conference Paper (2025)
Author(s)

Ana Maria Mekerishvili (Student TU Delft)

Junzi Sun (TU Delft - Aerospace Engineering)

Patrick Jonk (Royal Netherlands Aerospace Centre)

Vincent de Vries (Royal Netherlands Aerospace Centre)

Research Group
Operations & Environment
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Operations & Environment
Event
15th SESAR Innovation Days, SIDs 2025 (2025-12-01 - 2025-12-04), Bled, Slovenia
Downloads counter
3
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Radiotelephony remains the primary medium for pilot-controller communication, yet extracting structured information from spoken exchanges is challenging. Deep learning approaches often depend on large annotated datasets, limiting use in data-scarce environments. This study evaluates open-source Large Language Models for Structured Information Extraction from ATC communications, with applications in assisting or automating pseudo-pilot tasks. We evaluate Llama 3.3 (70B) with baseline prompting and Gemma 3 (4B) with baseline and fine-tuned variants on 496 utterances from NLR’s ATM simulator: NARSIM (NLR ATC real-time simulator). Performance is assessed on human transcripts and ASR outputs from Whisper models, with varying prompt contexts. Cross-sector generalization is tested across two ATC sectors. Using manual scoring, Llama 3.3 achieves micro-F1 0.95 on human transcripts and 0.86 on fine-tuned Whisper outputs. While Gemma 3 performed weaker in its baseline form, fine-tuning on a small sample led to notable improvements. Results demonstrate the potential of LLMs for ATC applications without the need for large annotated datasets.