Integrating Massive Data Streams

Conference paper (2021)

Authors

G. Siachamis Web Information Systems -

G.J.P.M. Houben Web Information Systems -

A. van Deursen Software Technology

A Katsifodimos Web Information Systems -

Research Group

Web Information Systems () (TU Delft)

Data integration Data streams

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:93e4af22-ba92-401d-937e-cfb02ad004fb

Published Date

2021

Language

English

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Software Technology

Research Group

Web Information Systems

Abstract

Data Integration has been a long-standing and challenging problem for enterprises and researchers. Data residing in multiple heterogeneous sources must be integrated and prepared such that the valuable information that it carries, can be extracted and analysed. However, the volume and the velocity of the produced data in addition to the modern business needs for real-time results have pushed data analytics, and therefore data integration, towards data streams. While data integration is a hard problem in and of itself, integrating data streams becomes even more challenging. Streams are characterized by their high velocity, infinite nature and predisposition to concept drift.

The goal of this doctoral work is to design and provide scalable methods to support data integration tasks on massive data streams, i.e., support streaming data integration. The aim of this work is threefold. First, we aim at developing and proposing streaming methods to compute temporal stream data-profiles and summaries that can describe the dynamic state of a stream in the course of time. Second, we aim at developing methods and metrics of stream similarity. Those methods and metrics can serve as means to detect similar or complementary streams in a streaming data lake. Finally, we aim at optimizing distributed streaming similarity joins - a very important operation that precedes entity linking and resolution. This paper discusses exciting challenges and open problems in the field, and a research plan on tackling them.

Files

Paper03_1_.pdf

(.pdf | 0.615 Mb)