De-DSI

None, None; None, None; None, None

De-DSI

Decentralised Differentiable Search Index

Conference Paper (2024)

Author(s)

P.M. Neague (TU Delft - Data-Intensive Systems)

Marcel Gregoriadis (TU Delft - Data-Intensive Systems)

J.A. Pouwelse (TU Delft - Data-Intensive Systems)

Research Group

Data-Intensive Systems

DOI related publication

https://doi.org/10.1145/3642970.3655837

Information Retrieval Large Language Models (LLMs) Distributed Systems

To reference this document use:

https://resolver.tudelft.nl/uuid:00e5918c-c995-468a-9f09-d1792da016d5

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Data-Intensive Systems

Pages (from-to)

134-143

ISBN (print)

979-8-4007-0541-0

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This study introduces De-DSI, a novel framework that fuses large language models (LLMs) with genuine decentralization for information retrieval, particularly employing the differentiable search index (DSI) concept in a decentralized setting. Focused on efficiently connecting novel user queries with document identifiers without direct document access, De-DSI operates solely on query-docid pairs. To enhance scalability, an ensemble of DSI models is introduced, where the dataset is partitioned into smaller shards for individual model training. This approach not only maintains accuracy by reducing the number of data each model needs to handle but also facilitates scalability by aggregating outcomes from multiple models. This aggregation uses a beam search to identify top docids and applies a softmax function for score normalization, selecting documents with the highest scores for retrieval. The decentralized implementation demonstrates that retrieval success is comparable to centralized methods, with the added benefit of the possibility of distributing computational complexity across the network. This setup also allows for the retrieval of multimedia items through magnet links, eliminating the need for platforms or intermediaries.

Files

3642970.3655837.pdf

(pdf | 0.661 Mb)