Emma in action

None, None; None, None; None, None; None, None; None, None

Emma in action

Declarative Dataflows for scalable data analysis

Conference Paper (2016)

Author(s)

Alexander Alexandrov (Technical University of Berlin)

Andreas Salzmann (Technical University of Berlin)

Georgi Krastev (Technical University of Berlin)

Asterios Katsifodimos (Technical University of Berlin)

Volker Markl (Technical University of Berlin)

Affiliation

External organisation

DOI related publication

https://doi.org/10.1145/2882903.2899396

To reference this document use:

https://resolver.tudelft.nl/uuid:a733f774-729e-4940-abce-92aa3eabaa96

More Info

expand_more

Publication Year

2016

Language

English

Affiliation

External organisation

Volume number

26-June-2016

Pages (from-to)

2073-2076

ISBN (electronic)

9781450335317

Abstract

Parallel dataow APIs based on second-order functions were originally seen as a exible alternative to SQL. Over time, however, their complexity increased due to the number of physical aspects that had to be exposed by the underlying engines in order to facilitate efficient execution. To retain a sufficient level of abstraction and lower the barrier of entry for data scientists, projects like Spark and Flink currently offer domain-specific APIs on top of their parallel collection abstractions. This demonstration highlights the benefits of an alternative design based on deep language embedding. We showcase Emma-A programming language embedded in Scala. Emma promotes parallel collection processing through native constructs like Scala's for-comprehensions-A declarative syntax akin to SQL. In addition, Emma also advocates quoting the entire data analysis algorithm rather than its individual dataow expressions. This allows for decomposing the quoted code into (sequential) control ow and (parallel) dataow fragments, optimizing the dataows in context, and transparently offloading them to an engine like Spark or Flink. The proposed design promises increased programmer productivity due to avoiding an impedance mismatch, thereby reducing the lag times and cost of data analysis.

No files available

Metadata only record. There are no files for this record.