Structured Command Extraction from Air Traffic Control Communications Using Large Language Models

None, None

Structured Command Extraction from Air Traffic Control Communications Using Large Language Models

Master Thesis (2025)

Author(s)

A.M. Mekerishvili (TU Delft - Aerospace Engineering)

Contributor(s)

Junzi Sun – Mentor (TU Delft - Operations & Environment)

Patrick Jonk – Mentor (Royal Netherlands Aerospace Centre NLR)

Faculty

Aerospace Engineering

LLM Information Extraction ATC Pseudo-pilot

To reference this document use:

https://resolver.tudelft.nl/uuid:dd741afd-f5c8-4ce7-b2d8-e1424923260f

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

15-09-2025

Awarding Institution

Delft University of Technology

Programme

['Aerospace Engineering']

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Radiotelephony (RT) remains the primary medium for pilot-controller communication, yet extracting structured information from spoken exchanges is challenging. Deep learning approaches often depend on large annotated datasets, limiting use in data-scarce environments. This study evaluates open-source large language models (LLMs) for Structured Information Extraction (SIE) from ATC communications, with applications in assisting or automating pseudo-pilot tasks. We evaluate Llama 3.3 (70B) with baseline prompting and Gemma-3 (4B) with baseline and fine-tuned variants on ~500 utterances from NLR's ATM simulator. Performance is assessed on human transcripts and ASR outputs from Whisper models, with varying prompt contexts. Cross-sector generalization is tested across two ATC sectors. Using manual scoring, Llama 3.3 achieves micro-F1 0.95 on human transcripts and 0.86 on fine-tuned Whisper outputs. While Gemma-3 performed weaker in its baseline form, fine-tuning on a small sample led to notable improvements. Results demonstrate the potential of LLMs for ATC applications without the need for large annotated datasets.

Files

Thesis_final_draft.pdf

(pdf | 2.61 Mb)

License info not available