An Extension of CodeFeedr

Bachelor Thesis (2020)
Author(s)

R.V.T. van der Heijden (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.C. van Wijngaarden (TU Delft - Electrical Engineering, Mathematics and Computer Science)

W.R. Zonneveld (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Katsifodimos – Mentor (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2020 Roald van der Heijden, Matthijs van Wijngaarden, Wouter Zonneveld
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 Roald van der Heijden, Matthijs van Wijngaarden, Wouter Zonneveld
Graduation Date
05-02-2020
Awarding Institution
Delft University of Technology
Project
['Codefeedr']
Programme
['Computer Science']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

CodeFeedr is a Mining Software Repository (MSR) tool designed to efficiently mine massive amounts of streaming data of projects from various sources using Flink’s streaming framework in combination with Kafka. Commissioned by researchers at TU Delft on the field of Data Science and Software Engineering, the goal of this project was to expand further on the product, as it already existed in a development stage. At the start of the project, CodeFeedr consisted of a core pipeline functionality and a limited amount of plugins which process data sources. CodeFeedr-1Up, as this development team calls itself, aimed to achieve two goals: the first goal is increasing the current amount of available plugins, defined by usable software repository sources, to be used by the client; the second goal is to implement a REPL functionality which requests user-friendly SQL-like queries and outputs the queried data stream. Maven, Cargo, NPM and ClearlyDefined have been developed and have extended the CodeFeedr tool. Furthermore, querying on the aforementioned data sources depending on their data structure is possible for sequential pipelines. With user aid and documentation in mind, logical data models of a plugin’s internal structure have been drawn and supplied in the report.

Files

License info not available