Synthetic Data for Robust Language Modeling

None, None

doi:10.4233/uuid:bea358f8-ff6f-43be-a065-a6e1a0b3bc5b

Synthetic Data for Robust Language Modeling

Doctoral Thesis (2026)

Author(s)

P. Lippmann (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

G.J.P.M. Houben – Promotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J. Yang – Copromotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Web Information Systems

Synthetic Data Language Model Robustness ValueSensitive Design HumanAI Collaboration Language Model Interpretability Language Model Reasoning Knowledge Injection

DOI related publication

https://doi.org/10.4233/uuid:bea358f8-ff6f-43be-a065-a6e1a0b3bc5b Final published version

To reference this document use

https://doi.org/10.4233/uuid:bea358f8-ff6f-43be-a065-a6e1a0b3bc5b

More Info

expand_more

Publication Year

2026

Language

English

Defense Date

01-06-2026

Awarding Institution

Delft University of Technology

Research Group

Web Information Systems

ISBN (electronic)

978-94-6518-333-6

Downloads counter

51

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

How do we ensure large language models are genuinely robust, rather than just performing well on benchmarks? This work investigates the critical vulnerabilities of modern LLMs—from their tendency to mimic reasoning styles without logical substance, to their susceptibility to high-confidence blind spots. By introducing targeted synthetic data generation, agent-guided knowledge injection, and value-sensitive escalation policies, this thesis offers a holistic approach to AI reliability. It provides actionable frameworks to localize brittleness, correct unknown unknowns, and navigate uncertain, high-stakes deployments with auditable, human-aligned decision-making.

Files

Main_print_copy.pdf

(pdf | 18.7 Mb)

License info not available