CH

C. Hong

info

Please Note

6 records found

Doctoral thesis (2026) - C. Hong, D.H.J. Epema, Y. Chen
Knowledge distillation, the process of transferring learned knowledge from one target (data or model) to a substitute, has become essential for improving efficiency to reduce computational cost while maintaining accuracy. However, knowledge differs across data quality (noisy/clean), task types (classification/generation), and model accessibility (black-box/white-box). These variations introduce distinct challenges. Thus, this thesis systematically investigates how to distill knowledge from multiple sources—noisy crowdsourced labels, black-box classifiers, white-box generative models, and more complex diffusion models—to improve both robustness and efficiency.
To address these challenges, this thesis proposes five research questions, combining theoretical analysis with empirical validation across diverse machine learning scenarios. The first challenge considers noisy crowdsourced labels, where non-professional workers introduce errors that degrade model performance. It calls for online aggregation methods to process data incrementally rather than in one go on a whole set. The second vulnerability involves black-box model distillation without real data, where efficiently generating high-quality synthetic queries remains difficult. The third challenge extends this to incorporating semantic information from public data, aiming to reduce the number of queries typically required for effective distillation. The fourth investigates generative model distillation, asking whether dark knowledge (inference probabilities) exists beyond final outputs and how it improves generalization. The fifth examines diffusion models, whose multi-step Markov chain structure introduces unique difficulties for distillation and sampling acceleration.
Chapter 2 tackles distilling knowledge from noisy crowdsourced labels. Unlike offline aggregation methods requiring all labels at once, we propose BILA , an online framework that processes label chunks incrementally using a confusion matrix-based neural network model which can be trained by first-order stochastic optimizers. BILA achieves higher accuracy than existing offline algorithms, enabling robust real-time label cleaning.
Chapter 3 addresses black-box distillation without access to real training data. Existing methods only explore the input space inefficiently. We propose TANDEMGAN, which combines exploration, which generates diverse synthetic queries, with exploitation, which focuses on high-confidence queries. This tandem architecture enables effective substitute model training in general adversarial scenarios where only class labels are available.
Chapter 4 further improves black-box efficiency by incorporating semantic information from public data knowledge. We introduce AEDM, which leverages pre-trained diffusion models to generate semantically rich query images resembling real data. By optimizing the input noise of the diffusion model based on substitute model feedback, AEDM achieves superior distillation accuracy with significantly fewer queries and extends to federated learning settings.
Chapter 5 provides a theoretical analysis for generative model distillation. We derive a risk bound demonstrating that incorporating dark knowledge, which is the underlying conditional distributions between inputs and outputs, improves generalization. Our DKtill framework aligns student and teacher probabilistic relationships, outperforming methods that rely solely on final outputs across GANs and VAEs.
Chapter 6 targets diffusion models. Unlike prior work that merely mimics outputs,
we propose SFDDM, which aligns the Markov chains of student and teacher models. By reparameterizing intermediate inputs and minimizing differences in both output and hidden variables, SFDDM produces high-quality samples with significantly fewer steps. SFDDM enables the distillation from the teacher to a student model with any desired step size.
Finally, we summarize the conclusions of this thesis. Two overarching findings emerge:
(1) robust distillation requires identifying and extracting the most valuable information from each source, whether through elaborated inputs, probabilistic relationships, or structural alignment; (2) efficient distillation demands methods that match the constraints of each setting, including incremental processing for noisy data, semantic priors for black-box queries, and chain alignment for diffusion models. We also discuss limitations, including narrow architectural choices, dependency on specific probability approximations, and computational overhead, while outlining future directions such as exploring more powerful networks, alternative quality criteria, rigorous proofs for non-probabilistic generators, and extension to guided or latent diffusion models.
...
Conference paper (2026) - Chi Hong, Jiyue Huang, Robert Birke, Dick Epema, Stefanie Roos, Lydia Y. Chen
While diffusion models effectively generate remarkable synthetic images, a key limitation is the inference inefficiency, requiring numerous sampling steps. To accelerate inference and maintain high-quality synthesis, teacher-student distillation is applied to compress the diffusion models in a progressive and binary manner by retraining, e.g., reducing the 1024-step model to a 128-step model in 3 folds. In this paper, we propose a single-fold distillation algorithm, SFDDM, which can flexibly compress the teacher diffusion model into a student model of any desired step, based on reparameterization of the intermediate inputs from the teacher model. To train the student diffusion, we minimize not only the output distance but also the distribution of the hidden variables between the teacher and student model. Extensive experiments on four datasets demonstrate that our student model trained by the proposed SFDDM is able to sample high-quality data with steps reduced to less than 1%, thus, trading off inference time. Our remarkable performance highlights that SFDDM effectively transfers knowledge in single-fold distillation, achieving semantic consistency and meaningful image interpolation. ...

Gradient Inversion of Federated Diffusion Models

Conference paper (2025) - Jiyue Huang, Chi Hong, Stefanie Roos, Lydia Y. Chen
Diffusion models are becoming the most prevalent generative models, producing exceptional high-quality image data through a stochastic process of diffusion steps based on Gaussian noises. Recent studies explore the federated training of diffusion models, enabling the collaborative training of a model without clients sharing raw data. We demonstrate that even without direct sharing of the data, the shared gradients of federated diffusion models already leak sensitive information about the raw data. We design the first gradient inversion attack GIDM for diffusion, which can reconstruct the training data from the shared model updates. GIDM is a two-phase fusion attack that is both efficient and effective. In its first phase, GIDM leverages the trained diffusion model itself as prior knowledge to constrain the inversion search (latent) space, followed by a second phase of pixel-wise fine-tuning. Different from existing inversion attacks on the classification models, inverting diffusion models present new challenges, most notably that the noise term and randomly sampled diffusion step are not known to the attacker but are required for the reconstruction. To tackle this challenge, we propose a joint triple-optimization algorithm to approximate the raw data, sampling step, and noise term simultaneously. GIDM is shown to be able to reconstruct images almost identical to the original ones and clearly outperforms baselines, i.e., GIDM without the second phase and state-of-the-art attacks on classifiers adapted to diffusion. The code of our method is available at https://github.com/GillHuang-Xtler/Diffusion_inversion. ...
Conference paper (2023) - J. Xu, C. Hong, J. Huang, Lydia Y. Chen, J.E.A.P. Decouchant
Federated learning is a private-by-design distributed learning paradigm where clients train local models on their own data before a central server aggregates their local updates to compute a global model. Depending on the aggregation method used, the local updates are either the gradients or the weights of local learning models, e.g., FedAvg aggregates model weights. Unfortunately, recent reconstruction attacks apply a gradient inversion optimization on the gradient update of a single mini- batch to reconstruct the private data used by clients during training. As the state-of-the-art reconstruction attacks solely focus on single update, realistic adversarial scenarios are over- looked, such as observation across multiple updates and updates trained from multiple mini-batches. A few studies consider a more challenging adversarial scenario where only model updates based on multiple mini-batches are observable, and resort to computationally expensive simulation to untangle the underlying samples for each local step. In this paper, we propose AGIC, a novel Approximate Gradient Inversion Attack that efficiently and effectively reconstructs images from both model or gradient updates, and across multiple epochs. In a nutshell, AGIC (i) approximates gradient updates of used training samples from model updates to avoid costly simulation procedures, (ii) leverages gradient/model updates collected from multiple epochs, and (iii) assigns increasing weights to layers with respect to the neural network structure for reconstruction quality. We extensively evaluate AGIC on three datasets, namely CIFAR-10, CIFAR- 100 and ImageNet. Our results show that AGIC increases the peak signal-to-noise ratio (PSNR) by up to 50% compared to two representative state-of-the-art gradient inversion attacks. Furthermore, AGIC is faster than the state-of-the-art simulation- based attack, e.g., it is 5x faster when attacking FedAvg with 8 local steps in between model updates. ...

Client Contribution and Selection in Federated Learning

Conference paper (2023) - Jiyue Huang, Chi Hong, Yang Liu, Lydia Y. Chen, Stefanie Roos
Federated learning (FL) enables collaborative learning between parties, called clients, without sharing the original and potentially sensitive data. To ensure fast convergence in the presence of such heterogeneous clients, it is imperative to timely select clients who can effectively contribute to learning. A realistic but overlooked case of heterogeneous clients are Mavericks, who monopolize the possession of certain data types, e.g., children hospitals possess most of the data on pediatric cardiology. In this paper, we address the importance and tackle the challenges of Mavericks by exploring two types of client selection strategies. First, we show theoretically and through simulations that the common contribution-based approach, Shapley Value, underestimates the contribution of Mavericks and is hence not effective as a measure to select clients. Then, we propose FedEMD, an adaptive strategy with competitive overhead based on the Wasserstein distance, supported by a proven convergence bound. As FedEMD adapts the selection probability such that Mavericks are preferably selected when the model benefits from improvement on rare classes, it consistently ensures the fast convergence in the presence of different types of Mavericks. Compared to existing strategies, including Shapley Value-based ones, FedEMD improves the convergence speed of neural network classifiers with FedAvg aggregation by 26.9% and its performance is consistent across various levels of heterogeneity. ...

A variational bayesian approach

Conference paper (2021) - Chi Hong, Amirmasoud Ghiassi, Yichi Zhou, Robert Birke, Lydia Y. Chen
Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregating results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BiLA , which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BiLA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BiLA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively. ...