Test Program-Based Generative Fuzzing for Differential Testing of the Kotlin Compiler

More Info
expand_more

Abstract

Kotlin is a programming language best known for its interoperability with Java, as well as the measurable improvements it offers over it. Since it became Android’s go-to language in 2019, the popularity and impact of Kotlin have risen greatly. Amidst this surge in popularity, the Kotlin developer team is working on a new version of the compiler that introduces sweeping changes to the ecosystem. Traditional compiler testing is a manual and laborious task that requires extensive developer effort and expertise. In an attempt to mitigate this, researchers have invested great resources in developing and perfecting automated compiler testing tools over the last decades. These approaches generate new pieces of code to test the behavior of compilers, which is assessed through differential testing. However, the usage of heuristics as guidance for the generative process is not well understood, and no approach that generates Kotlin code from scratch currently exists. In this thesis, we propose a novel method of enriching standard grammar specifications with language-targeted semantic context that is integrated in the sampling process. We structure generated code hierarchically and use it as the base of an evolutionary computation framework. Within this framework, we introduce two classes of algorithms that are novel to the field of compiler fuzzing, based on syntactic diversity and semantic proximity, respectively. We carry out an empirical analysis spanning 200K generated Kotlin files, which we analyzed through different Kotlin compiler versions. Our results uncovered five previously unreported categories of bugs, which we reported to the Kotlin compiler developer team. The developers verified and replicated our instances on the current re- lease of the Kotlin compiler, and have assigned target release dates for fixes within the current major version of the compiler. The study also provides new insight into the effects of heuristic-specific hyperparameters such as expression simplicity, dissimilarity measurements, and target selection.