Large-scale data stream processing systems

None, None; None, None; None, None; None, None; None, None; None, None; None, None

Large-scale data stream processing systems

Book Chapter (2017)

Author(s)

Paris Carbone (KTH Royal Institute of Technology)

Gábor E. Gévay (Technical University of Berlin)

Gábor Hermann (Technical University of Berlin)

A. Katsifodimos (Technical University of Berlin)

Juan Soto (Technical University of Berlin)

Volker Markl (Technical University of Berlin)

Seif Haridi (KTH Royal Institute of Technology)

Affiliation

External organisation

DOI related publication

https://doi.org/10.1007/978-3-319-49340-4_7

Harness

To reference this document use:

https://resolver.tudelft.nl/uuid:d1474663-f41c-4369-9fa4-17da3a51f210

More Info

expand_more

Publication Year

2017

Language

English

Affiliation

External organisation

Pages (from-to)

219-260

ISBN (print)

978-3-319-49339-8

ISBN (electronic)

978-3-319-49340-4

Abstract

In our data-centric society, online services, decision making, and other aspects are increasingly becoming heavily dependent on trends and patterns extracted from data. A broad class of societal-scale data management problems requires system support for processing unbounded data with low latency and high throughput. Large-scale data stream processing systems perceive data as infinite streams and are designed to satisfy such requirements. They have further evolved substantially both in terms of expressive programming model support and also efficient and durable runtime execution on commodity clusters. Expressive programming models offer convenient ways to declare continuous data properties and applied computations, while hiding details on how these data streams are physically processed and orchestrated in a distributed environment. Execution engines provide a runtime for such models further allowing for scalable yet durable execution of any declared computation. In this chapter we introduce the major design aspects of large scale data stream processing systems, covering programming model abstraction levels and runtime concerns. We then present a detailed case study on stateful stream processing with Apache Flink, an open-source stream processor that is used for a wide variety of processing tasks. Finally, we address the main challenges of disruptive applications that large-scale data streaming enables from a systemic point of view.

No files available

Metadata only record. There are no files for this record.