Combining learning with fuzzing for software deobfuscation

Master thesis (2016)

Authors

M. Janssen

Contributors

S.E. Verwer (mentor)

Programme

Embedded Systems () (TU Delft)

Active learning Fuzzing Reverse engineering Deobfuscation

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:6282cd05-6ae3-4f39-adc7-1a45efe1ccce

Published Date

20-04-2016

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Programme

Embedded Systems

Abstract

Software obfuscation is widely applied to prevent reverse engineering of applications. However, to evaluate security and validate behaviour we are interested in analysis such software. In this thesis, we give an overview of available obfuscation techniques as well as methods to undo this effort through reverse engineering and deobfuscation. We research active learning, which can be used to automatically learn state machine models of obfuscated software. These state machine models give insight into the behaviour of the program. We identify opportunities to improve the quality of existing active learning algorithms through the use of fuzzing to generate test cases. We utilise the AFL fuzzer, which uses a genetic algorithm in combination with test case mutation to create test cases for a target program. By using insight into the program's execution for each test case, it can create more relevant test cases compared to implementations that do not use this information. We use the generated test cases to find counterexamples for learned state machine models; these counterexamples can then be used by the learning algorithm to significantly improve the quality of the learned model. Compared to active learning with the W-method for test case generation, our combination of learning and fuzzing learns models of obfuscated programs with up to 343x more states, and consequently incorporates more of the program's behaviour into the learned state machine model.

Files

Thesis-combining-learning-with... (pdf)

(pdf | 2.74 Mb)