Behavioral Clustering of Non-Stationary IP Flow Record Data

More Info
expand_more

Abstract

Automated network traffic analysis using machine learning techniques plays an important role in managing networks and IT infrastructure. A key challenge to the correct and effective application of machine learning is dealing with non-stationary learning data sources and concept drift. Traffic evolves overtime due to new technology, software, services being used, changes in user behavior but also due to changes in network graphs like dynamic IP address assignment. In this paper, we present an automatic online method to detect changepointsin network traffic based on IP flow record analysis. This technique is used to segment an observed behavior into smaller consecutive behaviors differing one from another. The segmented traffic is used to learn small communication profile characterizing accurately the activities present between two observed changepoints. We validate our method using synthetic data and outlinea real-world application to botnet hosts behavior modeling.