High-Level Mutations for JSON Typed Data in Big Data Fuzz Testing

None, None

High-Level Mutations for JSON Typed Data in Big Data Fuzz Testing

Bachelor Thesis (2021)

Author(s)

L.E. Rhijnsburger (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

B. Ozkan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.E.A.P. Decouchant – Coach (TU Delft - Data-Intensive Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Json Big data applications Bigfuzz Big data systems Unique errors Json schema Input seed Input specification

To reference this document use:

https://resolver.tudelft.nl/uuid:7b944eaf-80d3-4ba2-9af9-769b651c3453

More Info

expand_more

Publication Year

2021

Language

English

Graduation Date

02-07-2021

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Fuzzing in Big Data applications is a relatively new field which is still lacking effective tools to support automated testing. Recently, a framework called BigFuzz was published which made fuzz testing for big data systems feasible. But there was no solution to work with Big Data programs that use JSON typed data. Big Data systems often make use of JSON typed data and JSON typed fuzzers for Big Data systems are currently not publicly found. With this work it is now possible to support JSON typed input data and apply fuzzing per iteration. The work requires a user defined input specification of the set of valid JSON inputs for the program under test, and a converted Java program based on the Spark program to test. However, it is almost certain the latter is not necessary in the future since it is likely this conversion can be automated.

This work is shown to be effective in finding bugs in a rather small amount of trials. Oppositely, it loses the descriptive exceptions, since it finds bugs later in the program instead of at the input validation phase. The work still has its limits to be applied extensively in the field of automatic testing, but serves as a proof of concept that automatically finding bugs in Big Data applications working with JSON typed data is in fact possible.

Files

High_Level_Mutations_for_JSON_... (pdf)

(pdf | 0.711 Mb)

License info not available