TabFuzz: High-level mutations for tabular data

More Info
expand_more

Abstract

Big Data is an expanding industry, yet exhaustive and automated testing of Big Data applications is still in its early stages. In the last few years, testing framework for Big Data applications have started appearing. BigFuzz is a program that uses fuzz testing for Big Data applications. Fuzz testing means generating random, potentially invalid or erroneous, inputs in attempt to find exceptions. This paper introduces TabFuzz, a tool that improves and extends the BigFuzz solution. TabFuzz reproduces the BigFuzz implementation and extends on it, by improving the generation of random input files. TabFuzz can generate a valid input file based on an input specification. It then mutates this file using high-level mutations. These mutations generate new test inputs that mimic real-world problems. This is an improvement over bit or byte level mutations. These mutations are supposed to mimic real-world problem, which is an improvement over random bit or byte level mutations. Most fuzzing programs start from a user-defined initial input file, called a seed file. TabFuzz offers the possibility to generate such a file. This research shows that these generated files are just as effective as starting from a seed file.