A plug-in infrastructure for the CodeFeedr project

None, None; None, None; None, None

A plug-in infrastructure for the CodeFeedr project

Bachelor Thesis (2018)

Author(s)

J.C. Kuijpers (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.J.R. Quist (TU Delft - Electrical Engineering, Mathematics and Computer Science)

W.D. Zorgdrager (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.E.P.M.F. Abeel – Mentor

Gousios Georgios – Graduation committee member

He Wang – Graduation committee member

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Big data CodeFeedr Data analysis Stream processing Apache Flink Apache Kafka Software analytics Scala

To reference this document use:

https://resolver.tudelft.nl/uuid:832a88a1-95b3-4c58-8318-946913bb2932

More Info

expand_more

Publication Year

2018

Language

English

Copyright

Graduation Date

02-07-2018

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

CodeFeedr is a research project at the software engineering division of the Delft University of Technology in collaboration with the Software Improvement Group. The research focuses on a software infrastructure which serves software practitioners in utilizing data-driven decision making. Currently, frameworks like Apache Flink are capable of high-performance data streaming. However, these frameworks have a lot of overhead in setting up, and adding new streaming queries takes a lot of time. They also have several limitations in combining real-time data with historical data and doing aggregations on streams from multiple sources. The developed product is a plug-in framework on top of Apache Flink, that provides a pipelining system for streaming queries. This product includes abstractions for well-known sources like GitHub, TravisCI and Twitter as well as support for historical data in mongoDB. With this framework the users can spend their efforts on actually writing streaming queries instead of setting up environments, input sources and output destinations. The product also includes orchestration tools for running streaming jobs on a distributed system.

Files

BachelorProjectReport.pdf

(pdf | 2.28 Mb)

License info not available