Structured Command Extraction from ATC Communications Using Open and Fine-Tuned Language Models

Conference Paper (2025)
Author(s)

Ana Maria Mekerishvili (Student TU Delft)

Junzi Sun (TU Delft - Aerospace Engineering)

Patrick Jonk (Royal Netherlands Aerospace Centre)

Vincent de Vries (Royal Netherlands Aerospace Centre)

Research Group
Operations & Environment
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Operations & Environment
Event
15th SESAR Innovation Days, SIDs 2025 (2025-12-01 - 2025-12-04), Bled, Slovenia
Downloads counter
11
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Radiotelephony remains the primary medium for pilot-controller communication, yet extracting structured information from spoken exchanges is challenging. Deep learning approaches often depend on large annotated datasets, limiting use in data-scarce environments. This study evaluates open-source Large Language Models for Structured Information Extraction from ATC communications, with applications in assisting or automating pseudo-pilot tasks. We evaluate Llama 3.3 (70B) with baseline prompting and Gemma 3 (4B) with baseline and fine-tuned variants on 496 utterances from NLR’s ATM simulator: NARSIM (NLR ATC real-time simulator). Performance is assessed on human transcripts and ASR outputs from Whisper models, with varying prompt contexts. Cross-sector generalization is tested across two ATC sectors. Using manual scoring, Llama 3.3 achieves micro-F1 0.95 on human transcripts and 0.86 on fine-tuned Whisper outputs. While Gemma 3 performed weaker in its baseline form, fine-tuning on a small sample led to notable improvements. Results demonstrate the potential of LLMs for ATC applications without the need for large annotated datasets.