Bridging the Semantic-Collaborative Gap

None, None

Bridging the Semantic-Collaborative Gap

Unified Item Quantization for LLM-based Generative Recommendation

Master Thesis (2026)

Author(s)

B. Lu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Mansoury – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Hanjalic – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

S. Tan – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

LLM Generative recommendation Semantic representation Collab- orative signal

To reference this document use

https://resolver.tudelft.nl/uuid:9b6344dd-c577-403a-8ea3-b2e63c42effc

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

28-05-2026

Awarding Institution

Delft University of Technology

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

37

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Model (LLM)-based generative recommendation reformulates item retrieval as an autoregressive sequence generation problem, representing items through discrete semantic identifiers (SIDs) constructed via the vector quantization of item embeddings. However, a critical yet underexplored limitation of existing tokenization methods is the semantic-collaborative gap. SIDs derived purely from item content fail to capture latent user preference patterns encoded in historical interaction data, whereas purely collaborative identifiers lack semantic grounding and generalize poorly to sparse or cold-start scenarios.

To bridge this gap, we propose the Unified Q-Former (UQF), a novel pre-quantization fusion framework designed to explicitly integrate semantic and collaborative signals into a unified item representation before discretization. Inspired by the query-based multimodal alignment of BLIP-2, UQF employs a set of learnable queries, parallel cross-attention over pre-trained item text embeddings and graph-based collaborative embeddings (via LightGCN), and adaptive gated fusion to dynamically extract complementary information from both modalities. To ensure robustness and structure preservation, the framework is optimized using a hybrid contrastive learning objective—incorporating both structural and semantic neighbors—coupled with asymmetric modality dropout.

The resulting unified representations are quantized into discrete SIDs via residual vector quantization (RQ-VAE) and utilized as target generation tokens for a downstream LLM recommender. Extensive experiments on two real-world Amazon Review datasets (Office Products and Musical Instruments) demonstrate that UQF consistently improves state-of-the-art LC-Rec and TIGER-style generative recommendation backbones. Our framework outperforms strong traditional, sequential, and recent unified generative baselines, yielding highly interpretable, hierarchical SID structures with significantly improved semantic and collaborative consistency.

Files

Full-thesis.pdf

(pdf | 20.7 Mb)

License info not available