Beyond Real Traffic
Assessing the Reliability of AI-Generated Network Data in Deep Learning-Based Intrusion Detection Models
H.E.J. Bosma (TU Delft - Electrical Engineering, Mathematics and Computer Science)
K. Liang – Mentor (TU Delft - Cyber Security)
G. Smaragdakis – Graduation committee member (TU Delft - Cyber Security)
J.H.G. Dauwels – Graduation committee member (TU Delft - Signal Processing Systems)
R. Wang – Mentor (TU Delft - Cyber Security)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This thesis investigates how reliably Large Language Model (LLM)-generated data can be used to train deep learning-based Intrusion Detection Systems (IDS) beyond traditional, real-traffic datasets. In the context of a small distributed environmental measurement application, application-layer sensor data (temperature, humidity, and particulate matter) and corresponding HTTP Network Traffic Telemetry (NTT) were collected over one week using Raspberry Pi measurement stations and Zeek. Two Long-Short-Term Memory (LSTM) models were trained: an Application Model (AM) for sensor anomalies and a Network Traffic Model (NTM) for network anomalies, combined in a voting-based IDS that outputs a trust score per source. Using a structured prompting strategy, a publicly available LLM was then employed to generate synthetic AM and NTT datasets. The similarity between real and synthetic data distributions was quantified using the Wasserstein distance, after which two experiment series were conducted: (1) progressively replacing real samples with synthetic ones while keeping training set size fixed, and (2) augmenting the real data with increasing fractions of synthetic samples. Results show that replacing more than roughly 10% of the AM training data degrades detection performance, whereas the NTM remains robust until real data is nearly fully replaced. In contrast, augmenting (rather than replacing) real data preserves, and in some cases modestly improves, IDS performance. Overall, the findings indicate that LLM-generated data can effectively complement—but not fully replace—real measurements when carefully integrated into IDS training pipelines.