A plug-in infrastructure for the CodeFeedr project

More Info
expand_more

Abstract

CodeFeedr is a research project at the software engineering division of the Delft University of Technology in collaboration with the Software Improvement Group. The research focuses on a software infrastructure which serves software practitioners in utilizing data-driven decision making. Currently, frameworks like Apache Flink are capable of high-performance data streaming. However, these frameworks have a lot of overhead in setting up, and adding new streaming queries takes a lot of time. They also have several limitations in combining real-time data with historical data and doing aggregations on streams from multiple sources. The developed product is a plug-in framework on top of Apache Flink, that provides a pipelining system for streaming queries. This product includes abstractions for well-known sources like GitHub, TravisCI and Twitter as well as support for historical data in mongoDB. With this framework the users can spend their efforts on actually writing streaming queries instead of setting up environments, input sources and output destinations. The product also includes orchestration tools for running streaming jobs on a distributed system.