Learning State Machines from data streams and an application in network-based threat detection

More Info
expand_more

Abstract

Our increasingly interconnected society poses large risks in terms of cyber security. With network traffic volumes increasing and systems becoming more connected, maintaining visibility on IT networks is a challenging yet important task. In recent years the number of cyber threats have increased dramatically. Monitoring and threat detection are more essential than ever to stay in control in a growing threat landscape. The powerful properties of state machines and the similarities between network traffic and traces used to learn state machines makes this a promising approach. Current learning methods; however, maintain an intermediate data structure that is converted in a state machine after all data has been processed. The continuous nature of network traffic makes this conventional approach inapplicable. This study provides a solution by developing a method for learning State Machines on real-time data streams. The proposed algorithm, framework and implementation are generic and can be applied to any use case that benefits from learning state machines on data streams. This thesis explores one specific use case, which is the use of state machine fingerprints in network-based threat detection. A system is designed capable of learning state machines on real-time traffic channels. The proposed detection method is demonstrated to be highly effective in matching traffic from various malware types to pre-learned fingerprints. The work in this thesis forms a stepping stone to the development of a robust detection method, capable of detecting a variety of threats on network data with low false alarm rates.