Sebastian Schelter | TU Delft Repository

An Intermediate Representation for Optimizing Machine Learning Pipelines

Journal article (2019) - Andreas Kunft (author) , Asterios Katsifodimos (author) , Sebastian Schelter (author) , Sebastian Breß (author) , Tilmann Rabl (author) , Volker Markl (author)

Machine learning (ML) pipelines for model training and validation typically include preprocessing, such as data cleaning and feature engineering, prior to training an ML model. Preprocessing combines relational algebra and user-defined functions (UDFs), while model training uses ...

BlockJoin

Efficient Matrix Partitioning Through Joins

Conference paper (2017) - Andreas Kunft (author) , A. Katsifodimos (author) , Sebastian Schelter (author) , Tilmann Rabl (author) , Volker Markl (author)

Linear algebra operations are at the core of many Machine Learning (ML) programs. At the same time, a considerable amount of the effort for solving data analytics problems is spent in data preparation. As a result, end-to- end ML pipelines often consist of (i) relational operator ...

Apache Flink

Stream Analytics at Scale

Other (2016) - Asterios Katsifodimos (author) , Sebastian Schelter (author)

Optimistic recovery for iterative dataflows in action

Conference paper (2015) - Sergey Dudoladov (author) , C. Xu (author) , Sebastian Schelter (author) , Asterios Katsifodimos (author) , Stephan Ewen (author) , Kostas Tzoumas (author) , Volker Markl (author)

Over the past years, parallel dataflow systems have been employed for advanced analytics in the field of data mining where many algorithms are iterative. These systems typically provide fault tolerance by periodically checkpointing the algorithm's state and, in case of failure, r ...