High-Level Mutations for JSON Typed Data in Big Data Fuzz Testing

Bachelor Thesis (2021)
Author(s)

L.E. Rhijnsburger (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

B. Ozkan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.E.A.P. Decouchant – Coach (TU Delft - Data-Intensive Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Lars Rhijnsburger
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Lars Rhijnsburger
Graduation Date
02-07-2021
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Fuzzing in Big Data applications is a relatively new field which is still lacking effective tools to support automated testing. Recently, a framework called BigFuzz was published which made fuzz testing for big data systems feasible. But there was no solution to work with Big Data programs that use JSON typed data. Big Data systems often make use of JSON typed data and JSON typed fuzzers for Big Data systems are currently not publicly found. With this work it is now possible to support JSON typed input data and apply fuzzing per iteration. The work requires a user defined input specification of the set of valid JSON inputs for the program under test, and a converted Java program based on the Spark program to test. However, it is almost certain the latter is not necessary in the future since it is likely this conversion can be automated.

This work is shown to be effective in finding bugs in a rather small amount of trials. Oppositely, it loses the descriptive exceptions, since it finds bugs later in the program instead of at the input validation phase. The work still has its limits to be applied extensively in the field of automatic testing, but serves as a proof of concept that automatically finding bugs in Big Data applications working with JSON typed data is in fact possible.

Files

License info not available