Does Knowledge Distillation Matter for Large Language Model-Based Bundle Generation?
Kaidong Feng (Yanshan University)
Zhu Sun (Singapore University of Technology and Design)
Jie Yang (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Hui Fang (Shanghai University of Finance and Economics)
Xinghua Qu (ByteDance)
Wenyuan Liu (Yanshan University)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Large Language Models (LLMs) have been extensively applied in various recommendation scenarios, including bundle generation, thanks to their exceptional reasoning capabilities and comprehensive knowledge. However, exploiting large-scale LLMs for bundle generation introduces significant efficiency challenges—primarily high computational costs during fine-tuning and inference due to their massive parameterization. Knowledge Distillation (KD) offers a promising solution by transferring expertise from large teacher models to more compact student models. This study systematically investigates KD approaches for bundle generation with the goal of minimizing computational demands while preserving performance. Specifically, we explore three critical research questions: (1) how does the format of distilled knowledge impact bundle generation performance? (2) to what extent does the quantity of distilled knowledge influence the performance? and (3) how do different ways of utilizing the distilled knowledge affect the performance? To support this investigation, we propose a comprehensive KD framework that (i) progressively extracts knowledge from raw data in increasingly complex forms, i.e., frequent patterns → formalized rules → deep thoughts; (ii) captures varying quantities of distilled knowledge through different sampling strategies, multi-domain accumulation, and multi-format aggregation; and (iii) exploits complementary LLM adaptation techniques—in-context learning, supervised fine-tuning, and their combination—to leverage the distilled knowledge for domain-specific adaptation and enhanced efficiency in small student models. Through extensive experiments on multiple real-world datasets, we provide valuable insights into how knowledge format, quantity, and utilization methods collectively shape the performance of LLM-based bundle generation, which exhibits the significant potential of KD for more efficient yet effective LLM-based bundle generation.