Andreas Kunft | TU Delft Repository

An Intermediate Representation for Optimizing Machine Learning Pipelines

Journal article (2019) - Andreas Kunft (author) , Asterios Katsifodimos (author) , Sebastian Schelter (author) , Sebastian Breß (author) , Tilmann Rabl (author) , Volker Markl (author)

Machine learning (ML) pipelines for model training and validation typically include preprocessing, such as data cleaning and feature engineering, prior to training an ML model. Preprocessing combines relational algebra and user-defined functions (UDFs), while model training uses ...

BlockJoin

Efficient Matrix Partitioning Through Joins

Conference paper (2017) - Andreas Kunft (author) , A. Katsifodimos (author) , Sebastian Schelter (author) , Tilmann Rabl (author) , Volker Markl (author)

Linear algebra operations are at the core of many Machine Learning (ML) programs. At the same time, a considerable amount of the effort for solving data analytics problems is spent in data preparation. As a result, end-to- end ML pipelines often consist of (i) relational operator ...

Bridging the Gap

Towards optimization across linear and relational Algebra

Conference paper (2016) - Andreas Kunft (author) , Alexander Alexandrov (author) , A. Katsifodimos (author) , Volker Markl (author)

Advanced data analysis typically requires some form of preprocessing in order to extract and transform data before processing it with machine learning and statistical analysis techniques. Pre-processing pipelines are naturally expressed in dataflow APIs (e.g., MapReduce, Flink, e ...

Implicit parallelism through deep language embedding

Conference paper (2015) - Alexander Alexandrov (author) , Andreas Kunft (author) , A. Katsifodimos (author) , Felix Schüler (author) , Lauritz Thamsen (author) , Odej Kao (author) , Tobias Herb (author) , Volker Markl (author)

The appeal of MapReduce has spawned a family of systems that implement or extend it. In order to enable parallel collection processing with User-Defined Functions (UDFs), these systems expose extensions of the MapReduce programming model as library-based dataow APIs that are tigh ...