Clustering Malware's Network Behavior using Simple Sequential Features

More Info
expand_more

Abstract

Developing malware variants is extremely cheap for attackers because of the availability of various obfuscation tools. These variants can be grouped in malware families, based on information retrieved from their static and dynamic analysis. Dynamic, network-level analysis of malware shows its core behavior since it captures the interaction with its developer. On the other hand, increasingly more emphasis is given to using Deep Packet Inspection (DPI) in order to cluster malware’s network behavior. However, DPI has severe privacy implications, as it involves inspecting payloads of the network traffic.

This thesis presents an exploratory study, the aim of which is to characterize and cluster malware behavior using high-level, non-privacy-invasive, sequential features extracted from its network activity. The key intuition behind the proposed solution is that if the underlying infrastructure of distinct malware samples is similar, the order in which they perform certain actions should also be similar. The results of this research show that sequence clustering allows flexible and robust clusters, as opposed to using non-sequential features. The clusters themselves reveal interesting attacking capabilities, such as port scans, and the same Command and Control server responding to different malware families. Lastly, a comparison with clusters obtained from static analysis reveals that network-based clustering is far more qualified to determine the many behaviors exhibited by a single malware family, as well as behaviors common across multiple malware families.