Beyond the Blend: A Ground-Truth Analysis of Bitcoin Mixer User Patterns
Employing machine learning to unravel the relationship between pre- and post-mixing transactions of Bitcoin mixer users
P.H.M. de Haan (TU Delft - Technology, Policy and Management)
Rolf S. van Wegberg – Mentor (TU Delft - Organisation & Governance)
F. d'Hont – Graduation committee member (TU Delft - Policy Analysis)
K.J.M. Lubbertsen – Graduation committee member (Fiscale inlichtingen- en opsporingsdienst (FIOD))
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Bitcoin mixers break the visible trail between incoming and outgoing transactions. By severing the link between pre-mixing and post-mixing addresses, they provide anonymity that is attractive for laundering illicit funds. For investigators, this creates two obstacles: the vast number of outputs that overwhelm capacity, and the lack of knowledge of internal mixer mechanics that forces reliance on external transaction signals.
This thesis investigates whether transaction patterns before and after mixing can reduce the pool of possible post-mixing addresses linked to a pre-mixing address. The aim is not to prove exact one-to-one links but to narrow the search space so investigators can focus on the most likely outcomes.
We use a unique dataset seized from Bestmixer.io, a centralised mixer dismantled in 2019, containing thousands of verified pre- and post-mixing addresses. The analysis proceeds in two stages. First, we cluster wallets on address-level attributes using HDBSCAN, which yields only coarse profiles. Second, we build transaction graphs capturing how funds move through the mixer, learn graph embeddings with a Graph Autoencoder, and cluster them with k-means. This graph-based view reveals clearer transaction patterns. Pre-mixing, we identify consolidators pooling funds, straightforward depositors from exchanges, aggregator funnels combining smaller inputs, and higher-risk users via unregulated services. Post-mixing, we find splitters dispersing funds, large distributors sending bigger amounts to fewer addresses, and straightforward users with minimal redistribution.
We then test whether pre-mixing patterns can predict post-mixing outcomes. Using tree-based ensemble models (Random Forest and Gradient Boosting) with graph embeddings and the original deposit amount, the best model achieves 48 percent accuracy across five classes, more than double the 20 percent baseline. This demonstrates that transaction graph signals can probabilistically reduce the investigative search space.
The study provides the first ground-truth typology of mixer transaction patterns and shows that probabilistic “de-mixing” is feasible. Rather than pinpointing a single post-mixing address, the method highlights a smaller set of likely candidates, offering law enforcement a way to prioritise leads without access to a mixer’s internal mechanics.