Obfuscation detection in Android applications using deep learning

Journal article (2022)

Authors

M. Conti Università degli Studi di Padova

P. Vinod Cochin University of Science and Technology

Alessio Vitella Università degli Studi di Padova

Affiliation

External organisation

Deep learning Convolutional Neural Network Natural language processing Obfuscation Android malware

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:fd553ba9-7573-4465-9b71-4236a6267b4a

Published Date

2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Affiliation

External organisation

Abstract

Malware is often hidden in illegitimately cloned software. Android, with over two billions active devices, is one of the most affected platforms because code cloning is quite simple and there are several not controlled markets. Obfuscation is both a cause and a solution to this scenario: a cause because obfuscated malware is harder to detect, a solution because obfuscation of legitimate applications makes code cloning more difficult. A deeper understanding of the obfuscation techniques would lead to more effective and aware use. In the literature, there are few methods of obfuscation detection with limited accuracy. Manual reverse engineering is too time-consuming to achieve this purpose, we need faster and automated techniques. In this work, we propose several deep learning models that can detect and classify the presence of obfuscation in Android applications. In addition to classical ML methods, we leverage natural language processing or image recognition approaches, then with a hybrid model, we exploit the best of each approach. Tests over a large dataset, made using different obfuscation tools, showed improvements compared to previous obfuscation detection methods. We target four obfuscation classes: identifier renaming, string encryption, reflection and class encryption, achieving an average F-measure of 0.985.