Combining learning with fuzzing for software deobfuscation

More Info
expand_more

Abstract

Software obfuscation is widely applied to prevent reverse engineering of applications. However, to evaluate security and validate behaviour we are interested in analysis such software. In this thesis, we give an overview of available obfuscation techniques as well as methods to undo this effort through reverse engineering and deobfuscation. We research active learning, which can be used to automatically learn state machine models of obfuscated software. These state machine models give insight into the behaviour of the program. We identify opportunities to improve the quality of existing active learning algorithms through the use of fuzzing to generate test cases. We utilise the AFL fuzzer, which uses a genetic algorithm in combination with test case mutation to create test cases for a target program. By using insight into the program's execution for each test case, it can create more relevant test cases compared to implementations that do not use this information. We use the generated test cases to find counterexamples for learned state machine models; these counterexamples can then be used by the learning algorithm to significantly improve the quality of the learned model. Compared to active learning with the W-method for test case generation, our combination of learning and fuzzing learns models of obfuscated programs with up to 343x more states, and consequently incorporates more of the program's behaviour into the learned state machine model.