Sv
S.R.P. van Hal
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Covert DNS Storage Channel Detection
Uncovering surreptitious data exchange using the phonebook of the internet
Master thesis
(2021)
-
S.R.P. van Hal, S.E. Verwer, R.L. Lagendijk, M. Finavaro Aniche, B. Vermeulen
The cyber arms race has red and blue teams continuously at their toes to keep ahead. Increasingly capable cyber actors breach secure networks at a worrying scale. While network monitoring and analysis should identify blatant data exfiltration attempts, covert channels bypass these measures and facilitate surreptitious information extraction. The many legitimate uses and widespread availability of DNS, the "phone book" of the internet, make it an attractive protocol for such covert channels. Covert DNS storage channels encode information in the payload of outbound DNS queries.
This thesis aims to assess the effectiveness of using machine learning methods to detect covert DNS storage channels. Our literature survey identified distinct differences in 1) algorithm type, either unsupervised anomaly detection or supervised classification, and 2) the information source for features, either isolated DNS queries or query sequences.
We performed experiments with (Extended) Isolation Forest algorithms for anomaly detection and Random Forests for classification, combined with different feature set compositions to evaluate their relative performance. Payload-only features were derived from isolated queries and behavioral features were extracted from time-based or fixed-length sliding windows over per-domain query sequences. We evaluated our models using a large-scale corporate DNS dataset of real-world proportions and a novel dataset of connection tunneling traffic and simulated credit card exfiltration malware.
We found that the majority of experiments were able to achieve high detection rates of 98.6% or more on a variety of storage channel threats, at low false positive rates. Classification models significantly outperform anomaly detection models on threats seen during training. Evaluation on unseen threats, however, revealed that generalization is difficult, provided the limited set of training threats and showed anomaly detection models more capable at detecting a variety of threats than classification models. We furthermore showed that feature sets with a behavioral component consistently outperform payload-only features, although our experiments were inconclusive regarding the relative performance between composite feature sets.
Given the prevalence of benign storage channels misusing DNS for legitimate data transfer, we recommend rigorous filtering of training data beforehand to improve model optimization and evaluation. Furthermore, extending the malicious training set with DNS command-and-control (C2) malware is a promising future research direction to improve generalization of classification models. ...
This thesis aims to assess the effectiveness of using machine learning methods to detect covert DNS storage channels. Our literature survey identified distinct differences in 1) algorithm type, either unsupervised anomaly detection or supervised classification, and 2) the information source for features, either isolated DNS queries or query sequences.
We performed experiments with (Extended) Isolation Forest algorithms for anomaly detection and Random Forests for classification, combined with different feature set compositions to evaluate their relative performance. Payload-only features were derived from isolated queries and behavioral features were extracted from time-based or fixed-length sliding windows over per-domain query sequences. We evaluated our models using a large-scale corporate DNS dataset of real-world proportions and a novel dataset of connection tunneling traffic and simulated credit card exfiltration malware.
We found that the majority of experiments were able to achieve high detection rates of 98.6% or more on a variety of storage channel threats, at low false positive rates. Classification models significantly outperform anomaly detection models on threats seen during training. Evaluation on unseen threats, however, revealed that generalization is difficult, provided the limited set of training threats and showed anomaly detection models more capable at detecting a variety of threats than classification models. We furthermore showed that feature sets with a behavioral component consistently outperform payload-only features, although our experiments were inconclusive regarding the relative performance between composite feature sets.
Given the prevalence of benign storage channels misusing DNS for legitimate data transfer, we recommend rigorous filtering of training data beforehand to improve model optimization and evaluation. Furthermore, extending the malicious training set with DNS command-and-control (C2) malware is a promising future research direction to improve generalization of classification models. ...
The cyber arms race has red and blue teams continuously at their toes to keep ahead. Increasingly capable cyber actors breach secure networks at a worrying scale. While network monitoring and analysis should identify blatant data exfiltration attempts, covert channels bypass these measures and facilitate surreptitious information extraction. The many legitimate uses and widespread availability of DNS, the "phone book" of the internet, make it an attractive protocol for such covert channels. Covert DNS storage channels encode information in the payload of outbound DNS queries.
This thesis aims to assess the effectiveness of using machine learning methods to detect covert DNS storage channels. Our literature survey identified distinct differences in 1) algorithm type, either unsupervised anomaly detection or supervised classification, and 2) the information source for features, either isolated DNS queries or query sequences.
We performed experiments with (Extended) Isolation Forest algorithms for anomaly detection and Random Forests for classification, combined with different feature set compositions to evaluate their relative performance. Payload-only features were derived from isolated queries and behavioral features were extracted from time-based or fixed-length sliding windows over per-domain query sequences. We evaluated our models using a large-scale corporate DNS dataset of real-world proportions and a novel dataset of connection tunneling traffic and simulated credit card exfiltration malware.
We found that the majority of experiments were able to achieve high detection rates of 98.6% or more on a variety of storage channel threats, at low false positive rates. Classification models significantly outperform anomaly detection models on threats seen during training. Evaluation on unseen threats, however, revealed that generalization is difficult, provided the limited set of training threats and showed anomaly detection models more capable at detecting a variety of threats than classification models. We furthermore showed that feature sets with a behavioral component consistently outperform payload-only features, although our experiments were inconclusive regarding the relative performance between composite feature sets.
Given the prevalence of benign storage channels misusing DNS for legitimate data transfer, we recommend rigorous filtering of training data beforehand to improve model optimization and evaluation. Furthermore, extending the malicious training set with DNS command-and-control (C2) malware is a promising future research direction to improve generalization of classification models.
This thesis aims to assess the effectiveness of using machine learning methods to detect covert DNS storage channels. Our literature survey identified distinct differences in 1) algorithm type, either unsupervised anomaly detection or supervised classification, and 2) the information source for features, either isolated DNS queries or query sequences.
We performed experiments with (Extended) Isolation Forest algorithms for anomaly detection and Random Forests for classification, combined with different feature set compositions to evaluate their relative performance. Payload-only features were derived from isolated queries and behavioral features were extracted from time-based or fixed-length sliding windows over per-domain query sequences. We evaluated our models using a large-scale corporate DNS dataset of real-world proportions and a novel dataset of connection tunneling traffic and simulated credit card exfiltration malware.
We found that the majority of experiments were able to achieve high detection rates of 98.6% or more on a variety of storage channel threats, at low false positive rates. Classification models significantly outperform anomaly detection models on threats seen during training. Evaluation on unseen threats, however, revealed that generalization is difficult, provided the limited set of training threats and showed anomaly detection models more capable at detecting a variety of threats than classification models. We furthermore showed that feature sets with a behavioral component consistently outperform payload-only features, although our experiments were inconclusive regarding the relative performance between composite feature sets.
Given the prevalence of benign storage channels misusing DNS for legitimate data transfer, we recommend rigorous filtering of training data beforehand to improve model optimization and evaluation. Furthermore, extending the malicious training set with DNS command-and-control (C2) malware is a promising future research direction to improve generalization of classification models.
Cairo-based IT company Key2Soft is working on a comprehensive system to automate various systems in Egyptian primary-,middle- and high schools. This software system, named Key2School, includes a timetabling component, with which the company aims to relieve the workload of timetablers by providing them with a system which automatically generates timetables for all teachers, students and subjects. In consultation with the company, both functional requirements and timetable requirements have been composed for the timetabling part.
A literature study has been conducted to find and compare existing timetabling algorithms and libraries in order to select the best match for the company. All existing algorithms in literature were found to be too slow, so a system has been designed around an open source timetabling program. This system contains a part where the program is managed, a part which interfaces with the database of Key2Soft and a part where the timetable resources are constructed in a compatible manner. The system has been implemented according to and in consultation with programmers at Key2Soft and will be integrated in Key2School in the future. The system is programmed mainly in C# and uses XML files to configure the timetabling library. The system has been thoroughly tested with NUnit, a platform-specific unit testing library, which enabled the developers to verify the code quality. The code has furthermore been evaluated by the independent IT consultant SIG. ...
A literature study has been conducted to find and compare existing timetabling algorithms and libraries in order to select the best match for the company. All existing algorithms in literature were found to be too slow, so a system has been designed around an open source timetabling program. This system contains a part where the program is managed, a part which interfaces with the database of Key2Soft and a part where the timetable resources are constructed in a compatible manner. The system has been implemented according to and in consultation with programmers at Key2Soft and will be integrated in Key2School in the future. The system is programmed mainly in C# and uses XML files to configure the timetabling library. The system has been thoroughly tested with NUnit, a platform-specific unit testing library, which enabled the developers to verify the code quality. The code has furthermore been evaluated by the independent IT consultant SIG. ...
Cairo-based IT company Key2Soft is working on a comprehensive system to automate various systems in Egyptian primary-,middle- and high schools. This software system, named Key2School, includes a timetabling component, with which the company aims to relieve the workload of timetablers by providing them with a system which automatically generates timetables for all teachers, students and subjects. In consultation with the company, both functional requirements and timetable requirements have been composed for the timetabling part.
A literature study has been conducted to find and compare existing timetabling algorithms and libraries in order to select the best match for the company. All existing algorithms in literature were found to be too slow, so a system has been designed around an open source timetabling program. This system contains a part where the program is managed, a part which interfaces with the database of Key2Soft and a part where the timetable resources are constructed in a compatible manner. The system has been implemented according to and in consultation with programmers at Key2Soft and will be integrated in Key2School in the future. The system is programmed mainly in C# and uses XML files to configure the timetabling library. The system has been thoroughly tested with NUnit, a platform-specific unit testing library, which enabled the developers to verify the code quality. The code has furthermore been evaluated by the independent IT consultant SIG.
A literature study has been conducted to find and compare existing timetabling algorithms and libraries in order to select the best match for the company. All existing algorithms in literature were found to be too slow, so a system has been designed around an open source timetabling program. This system contains a part where the program is managed, a part which interfaces with the database of Key2Soft and a part where the timetable resources are constructed in a compatible manner. The system has been implemented according to and in consultation with programmers at Key2Soft and will be integrated in Key2School in the future. The system is programmed mainly in C# and uses XML files to configure the timetabling library. The system has been thoroughly tested with NUnit, a platform-specific unit testing library, which enabled the developers to verify the code quality. The code has furthermore been evaluated by the independent IT consultant SIG.