Covert DNS Storage Channel Detection

van Hal, S.R.P.

Covert DNS Storage Channel Detection

Uncovering surreptitious data exchange using the phonebook of the internet

Master thesis (2021)

Authors

S.R.P. van Hal Electrical Engineering, Mathematics and Computer Science

Contributors

S.E. Verwer Cyber Security (mentor)

Reginald L. Lagendijk Cyber Security (graduation committee member)

Maurício Aniche Software Engineering (graduation committee member)

B. Vermeulen (mentor)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Machine Learning Anomaly Detection Classification DNS storage channels

To reference this document use:

http://resolver.tudelft.nl/uuid:df016cc5-bd42-4c01-b16d-6d4889246861

More Info

expand_more

Published Date

14-07-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

The cyber arms race has red and blue teams continuously at their toes to keep ahead. Increasingly capable cyber actors breach secure networks at a worrying scale. While network monitoring and analysis should identify blatant data exfiltration attempts, covert channels bypass these measures and facilitate surreptitious information extraction. The many legitimate uses and widespread availability of DNS, the "phone book" of the internet, make it an attractive protocol for such covert channels. Covert DNS storage channels encode information in the payload of outbound DNS queries.

This thesis aims to assess the effectiveness of using machine learning methods to detect covert DNS storage channels. Our literature survey identified distinct differences in 1) algorithm type, either unsupervised anomaly detection or supervised classification, and 2) the information source for features, either isolated DNS queries or query sequences.

We performed experiments with (Extended) Isolation Forest algorithms for anomaly detection and Random Forests for classification, combined with different feature set compositions to evaluate their relative performance. Payload-only features were derived from isolated queries and behavioral features were extracted from time-based or fixed-length sliding windows over per-domain query sequences. We evaluated our models using a large-scale corporate DNS dataset of real-world proportions and a novel dataset of connection tunneling traffic and simulated credit card exfiltration malware.

We found that the majority of experiments were able to achieve high detection rates of 98.6% or more on a variety of storage channel threats, at low false positive rates. Classification models significantly outperform anomaly detection models on threats seen during training. Evaluation on unseen threats, however, revealed that generalization is difficult, provided the limited set of training threats and showed anomaly detection models more capable at detecting a variety of threats than classification models. We furthermore showed that feature sets with a behavioral component consistently outperform payload-only features, although our experiments were inconclusive regarding the relative performance between composite feature sets.

Given the prevalence of benign storage channels misusing DNS for legitimate data transfer, we recommend rigorous filtering of training data beforehand to improve model optimization and evaluation. Furthermore, extending the malicious training set with DNS command-and-control (C2) malware is a promising future research direction to improve generalization of classification models.

Files

MScThesis_DNSStorageChannels_s... (pdf)

(pdf | 11 Mb)

License info not available