Obfuscation detection in Android applications using deep learning

More Info
expand_more

Abstract

Malware is often hidden in illegitimately cloned software. Android, with over two billions active devices, is one of the most affected platforms because code cloning is quite simple and there are several not controlled markets. Obfuscation is both a cause and a solution to this scenario: a cause because obfuscated malware is harder to detect, a solution because obfuscation of legitimate applications makes code cloning more difficult. A deeper understanding of the obfuscation techniques would lead to more effective and aware use. In the literature, there are few methods of obfuscation detection with limited accuracy. Manual reverse engineering is too time-consuming to achieve this purpose, we need faster and automated techniques. In this work, we propose several deep learning models that can detect and classify the presence of obfuscation in Android applications. In addition to classical ML methods, we leverage natural language processing or image recognition approaches, then with a hybrid model, we exploit the best of each approach. Tests over a large dataset, made using different obfuscation tools, showed improvements compared to previous obfuscation detection methods. We target four obfuscation classes: identifier renaming, string encryption, reflection and class encryption, achieving an average F-measure of 0.985.