Transformers can do Bayesian Clustering
P. Bhaskaran (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Tom J. Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
O.K. Shirekar – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Marcel .J.T. Reinders – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
J.W. Böhmer – Graduation committee member (TU Delft - Sequential Decision Making)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Motivation: Clustering is an unsupervised learning task with broad applications. Traditional clustering methods often rely on point estimates of model parameters, which can limit their ability to capture uncertainty. Bayesian clustering addresses this by incorporating uncertainty into parameter estimation. However, existing Bayesian inference methods like Markov Chain Monte Carlo and Variational Inference are computationally intensive and can produce biased approximations. To overcome these limitations, we propose Cluster-PFN, a transformer-based model inspired by Prior-Data Fitted Networks. Cluster-PFN simultaneously approximates the posterior distributions over the cluster assignments of individual data points and the total number of clusters for the given dataset in a single forward pass. Our model provides fast and accurate Bayesian clustering, supporting data with up to five features and conditioning on user-specified cluster counts.
Results: Our results demonstrate that Cluster-PFN can predict the number of clusters up to 20% more accurately than standard heuristics. It also outperforms the Bayesian Gaussian Mixture Model using Variational Inference (Bayesian GMM VI), achieving up to 60% higher scores on certain external metrics while being up to 20 times faster during inference. Additionally, Cluster-PFN surpasses both the traditional Gaussian Mixture Model and K-means++ across the same external evaluation metrics.
Files
File under embargo until 04-07-2026