Adaptive Distributed Streaming Similarity Joins

None, None; None, None; None, None; None, None; None, None; None, None

Adaptive Distributed Streaming Similarity Joins

Conference Paper (2023)

Author(s)

G. Siachamis (TU Delft - Web Information Systems)

K. Psarakis (TU Delft - Web Information Systems)

M. Fragkoulis (Delivery Hero SE)

Odysseas Papapetrou (Eindhoven University of Technology)

A. Van Van Deursen (TU Delft - Software Technology)

Asterios Katsifodimos (TU Delft - Web Information Systems)

Research Group

Web Information Systems

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:7110874a-a227-4407-a35e-7f78e4b2d8b8

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Research Group

Web Information Systems

Pages (from-to)

25-36

ISBN (electronic)

979-8-4007-0122-1

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

How can we perform similarity joins of multi-dimensional streams in a distributed fashion, achieving low latency? Can we adaptively repartition those streams in order to retain high performance under concept drifts? Current approaches to similarity joins are either restricted to single-node deployments or focus on set-similarity joins, failing to cover the ubiquitous case of metric-space similarity joins. In this paper, we propose the first adaptive distributed streaming similarity join approach that gracefully scales with variable velocity and distribution of multi-dimensional data streams. Our approach can adaptively rebalance the load of nodes in the case of concept drifts, allowing for similarity computations in the general metric space. We implement our approach on top of Apache Flink and evaluate its data partitioning and load balancing schemes on a set of synthetic datasets in terms of latency, comparisons ratio, and data duplication ratio

Files

3583678.3596891.pdf

(pdf | 9.11 Mb)