Automated data exfiltration detection using netflow metadata

More Info
expand_more

Abstract

The volume and sophistication of data exfiltration attacks over networks have significantly increased in the last decade. This has resulted in the need for defense mechanisms, to effectively detect both known and unknown data exfiltration scenarios over the network. While methods such as DPI (Deep Packet Inspection) are commonly used to detect data exfiltrations, this mechanism requires a thorough inspection of every payload or packet going out of the network, making it unsuitable for use in some environments, as it is quite resource intensive and can lead to severe data privacy implications. In our work, we use lightweight netflows which are non-privacy invasive to detect data exfiltrations at connection-level granularity. The key intuition behind our proposed solution is that connections involved in data exfiltration tend to differentiate themselves from normal network connections based on certain feature values. The result of this research shows that features extracted from netflows such as the duration of a netflow, the source bytes, the source bytes sent per second, the source bytes sent per packet and the producer-consumer ratio can be used to effectively detect data exfiltration. Subsequently, connections are grouped using k-means, and the robust Z-score of their distances from their respective cluster centroid is used as a statistical and distance-based technique to detect connections involved in a data exfiltration. While this method detects some data exfiltration scenarios, it results in a significant number of false positives. Combining this with the results from the LOF (local outlier factor) and the LoOP (local outlier probability), which are density-based techniques, leads to a more robust model, as it significantly reduces the number of false positives and false negatives. Also, we show that using the smallest clusters formed from k-means for analysis leads to similar detection results as the entire datasets, with a significant reduction in computation time.

Files