Multi-Source Knowledge Distillation for Robust and Efficient Machine Learning

None, None

doi:10.4233/uuid:cc93e491-2ef6-4b0f-8293-db90f7014173

Multi-Source Knowledge Distillation for Robust and Efficient Machine Learning

Doctoral Thesis (2026)

Author(s)

C. Hong (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

D.H.J. Epema – Promotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Y. Chen – Promotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Data-Intensive Systems

Probabilistic inference Model compression Knowledge distillation Diffusion models Data-free attack Black-box attack White-box attack

DOI related publication

https://doi.org/10.4233/uuid:cc93e491-2ef6-4b0f-8293-db90f7014173 Final published version

To reference this document use

https://doi.org/10.4233/uuid:cc93e491-2ef6-4b0f-8293-db90f7014173

More Info

expand_more

Publication Year

2026

Language

English

Defense Date

18-06-2026

Awarding Institution

Delft University of Technology

Research Group

Data-Intensive Systems

ISBN (electronic)

978-94-6518-326-8

Downloads counter

35

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Knowledge distillation, the process of transferring learned knowledge from one target (data or model) to a substitute, has become essential for improving efficiency to reduce computational cost while maintaining accuracy. However, knowledge differs across data quality (noisy/clean), task types (classification/generation), and model accessibility (black-box/white-box). These variations introduce distinct challenges. Thus, this thesis systematically investigates how to distill knowledge from multiple sources—noisy crowdsourced labels, black-box classifiers, white-box generative models, and more complex diffusion models—to improve both robustness and efficiency.
To address these challenges, this thesis proposes five research questions, combining theoretical analysis with empirical validation across diverse machine learning scenarios. The first challenge considers noisy crowdsourced labels, where non-professional workers introduce errors that degrade model performance. It calls for online aggregation methods to process data incrementally rather than in one go on a whole set. The second vulnerability involves black-box model distillation without real data, where efficiently generating high-quality synthetic queries remains difficult. The third challenge extends this to incorporating semantic information from public data, aiming to reduce the number of queries typically required for effective distillation. The fourth investigates generative model distillation, asking whether dark knowledge (inference probabilities) exists beyond final outputs and how it improves generalization. The fifth examines diffusion models, whose multi-step Markov chain structure introduces unique difficulties for distillation and sampling acceleration.
Chapter 2 tackles distilling knowledge from noisy crowdsourced labels. Unlike offline aggregation methods requiring all labels at once, we propose BILA , an online framework that processes label chunks incrementally using a confusion matrix-based neural network model which can be trained by first-order stochastic optimizers. BILA achieves higher accuracy than existing offline algorithms, enabling robust real-time label cleaning.
Chapter 3 addresses black-box distillation without access to real training data. Existing methods only explore the input space inefficiently. We propose TANDEMGAN, which combines exploration, which generates diverse synthetic queries, with exploitation, which focuses on high-confidence queries. This tandem architecture enables effective substitute model training in general adversarial scenarios where only class labels are available.
Chapter 4 further improves black-box efficiency by incorporating semantic information from public data knowledge. We introduce AEDM, which leverages pre-trained diffusion models to generate semantically rich query images resembling real data. By optimizing the input noise of the diffusion model based on substitute model feedback, AEDM achieves superior distillation accuracy with significantly fewer queries and extends to federated learning settings.

Chapter 5 provides a theoretical analysis for generative model distillation. We derive a risk bound demonstrating that incorporating dark knowledge, which is the underlying conditional distributions between inputs and outputs, improves generalization. Our DKtill framework aligns student and teacher probabilistic relationships, outperforming methods that rely solely on final outputs across GANs and VAEs.

Chapter 6 targets diffusion models. Unlike prior work that merely mimics outputs,

we propose SFDDM, which aligns the Markov chains of student and teacher models. By reparameterizing intermediate inputs and minimizing differences in both output and hidden variables, SFDDM produces high-quality samples with significantly fewer steps. SFDDM enables the distillation from the teacher to a student model with any desired step size.

Finally, we summarize the conclusions of this thesis. Two overarching findings emerge:

(1) robust distillation requires identifying and extracting the most valuable information from each source, whether through elaborated inputs, probabilistic relationships, or structural alignment; (2) efficient distillation demands methods that match the constraints of each setting, including incremental processing for noisy data, semantic priors for black-box queries, and chain alignment for diffusion models. We also discuss limitations, including narrow architectural choices, dependency on specific probability approximations, and computational overhead, while outlining future directions such as exploring more powerful networks, alternative quality criteria, rigorous proofs for non-probabilistic generators, and extension to guided or latent diffusion models.

Files

CHI_thesis_print.pdf

(pdf | 11.6 Mb)

License info not available