High-Level Mutations for JSON Typed Data in Big Data Fuzz Testing

More Info
expand_more

Abstract

Fuzzing in Big Data applications is a relatively new field which is still lacking effective tools to support automated testing. Recently, a framework called BigFuzz was published which made fuzz testing for big data systems feasible. But there was no solution to work with Big Data programs that use JSON typed data. Big Data systems often make use of JSON typed data and JSON typed fuzzers for Big Data systems are currently not publicly found. With this work it is now possible to support JSON typed input data and apply fuzzing per iteration. The work requires a user defined input specification of the set of valid JSON inputs for the program under test, and a converted Java program based on the Spark program to test. However, it is almost certain the latter is not necessary in the future since it is likely this conversion can be automated.

This work is shown to be effective in finding bugs in a rather small amount of trials. Oppositely, it loses the descriptive exceptions, since it finds bugs later in the program instead of at the input validation phase. The work still has its limits to be applied extensively in the field of automatic testing, but serves as a proof of concept that automatically finding bugs in Big Data applications working with JSON typed data is in fact possible.