Learning from Leakage: Database Reconstruction from Just a Few Multidimensional Range Queries
P. Li (TU Delft - Electrical Engineering, Mathematics and Computer Science)
G. Smaragdakis – Mentor (TU Delft - Cyber Security)
K. Liang – Mentor (TU Delft - Cyber Security)
H. Chen – Mentor (TU Delft - Cyber Security)
Jérémie Decouchant – Graduation committee member (TU Delft - Data-Intensive Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Searchable Encryption (SE) has shown a lot of promise towards enabling secure and efficient queries over encrypted data. In order to achieve this efficiency, SE inevitably leaks some information, and a big open question is how dangerous this leakage is. While prior reconstruction attacks have demonstrated effectiveness in one-dimensional settings, extending them to high-dimensional datasets remains challenging. Existing methods either demand excessive query information (e.g. an attacker that has observed all possible responses) or produce low-quality reconstructions in sparse databases.
In this work, we present REMIN, a new leakage-abuse attack against SE schemes in multi-dimensional settings, based on access and search pattern leakage from range queries. Our approach leverages unsupervised representation learning to transform query co-occurrence frequencies into geometric signals, allowing the attacker to infer relative spatial relationships between records. This enables accurate and scalable reconstruction of high-dimensional datasets under minimal leakage. Furthermore, we introduce REMIN-P, a practical variant of the attack that incorporates a poisoning strategy. By injecting a small number of auxiliary anchor points—either known or intentionally leaked—REMIN-P significantly improves reconstruction quality, particularly in sparse or boundary regions.
We evaluate our attacks extensively on both synthetic and real-world structured datasets. Compared to state-of-the-art reconstruction attacks, our reconstruction attack achieves up to 50% reduction in mean squared error (MSE), all while maintaining fast and scalable runtime. When the poisoning strategy is chosen properly, our poisoning attack further reduces MSE by an additional 50% on average. To the best of our knowledge, these are the first attacks that enables accurate multi-dimensional reconstruction under low-leakage conditions for any type of database.