C.A. Hammerschmidt
Please Note
12 records found
1
FlexFringe
Modeling software behavior by learning probabilistic automata
We present the efficient implementations of probabilistic deterministic finite automaton learning methods available in FlexFringe. These are well-known strategies for state merging, including several modifications to improve their performance in practice. We show experimentally that these algorithms obtain competitive results and significant improvements over a default implementation. We also demonstrate how to use FlexFringe to learn interpretable models from software logs and use these for anomaly detection. Although less interpretable, we show that learning smaller, more convoluted models improves the performance of FlexFringe on anomaly detection, making it competitive with an existing solution based on neural nets.
Beyond Labeling
Using Clustering to Build Network Behavioral Profiles of Malware Families
Malware family labels are known to be inconsistent. They are also black-box since they do not represent the capabilities of malware. The current state of the art in malware capability assessment includes mostly manual approaches, which are infeasible due to the ever-increasing volume of discovered malware samples. We propose a novel unsupervised machine learning-based method called MalPaCA, which automates capability assessment by clustering the temporal behavior in malware's network traces. MalPaCA provides meaningful behavioral clusters using only 20 packet headers. Behavioral profiles are generated based on the cluster membership of malware's network traces. A Directed Acyclic Graph shows the relationship between malwares according to their overlapping behaviors. The behavioral profiles together with the DAG provide more insightful characterization of malware than current family designations. We also propose a visualization-based evaluation method for the obtained clusters to assist practitioners in understanding the clustering results. We apply MalPaCA on a financial malware dataset collected in the wild that comprises 1.1 k malware samples resulting in 3.6 M packets. Our experiments show that (i) MalPaCA successfully identifies capabilities, such as port scans and reuse of Command and Control servers; (ii) It uncovers multiple discrepancies between behavioral clusters and malware family labels; and (iii) It demonstrates the effectiveness of clustering traces using temporal features by producing an error rate of 8.3%, compared to 57.5% obtained from statistical features.
Cyber-attacks become more sophisticated and complex especially when adversaries steal user credentials to traverse the network of an organization. Detecting a breach is extremely difficult and this is confirmed by the findings of studies related to cyber-attacks on organizations. A study conducted last year by IBM found that it takes 206 days on average to US companies to detect a data breach. As a consequence, the effectiveness of existing defensive tools is in question. In this work we deal with the detection of malicious authentication events, which are responsible for effective execution of the stealthy attack, called lateral movement. Authentication event logs produce a pure categorical feature space which creates methodological challenges for developing outlier detection algorithms. We propose an auto semi-supervised outlier ensemble detector that does not leverage the ground truth to learn the normal behavior. The automatic nature of our methodology is supported by established unsupervised outlier ensemble theory. We test the performance of our detector on a real-world cyber security dataset provided publicly by the Los Alamos National Lab. Overall, our experiments show that our proposed detector outperforms existing algorithms and produces a 0 False Negative Rate without missing any malicious login event and a False Positive Rate which improves the state-of-the-art. In addition, by detecting malicious authentication events, compared to the majority of the existing works which focus solely on detecting malicious users or computers, we are able to provide insights regarding when and at which systems malicious login events happened. Beyond the application on a public dataset we are working with our industry partner, POST Luxembourg, to employ the proposed detector on their network.
Managed security service providers increasingly rely on machine-learning methods to exceed traditional, signature-based threat detection and classification methods. As machine-learning often improves with more data available, smaller organizations and clients find themselves at a disadvantage: Without the ability to share their data and others willing to collaborate, their machine-learned threat detection will perform worse than the same model in a larger organization. We show that Federated Learning, i.e. collaborative learning without data sharing, successfully helps to overcome this problem. Our experiments focus on a common task in cyber security, the detection of unwanted URLs in network traffic seen by security-as-a-service providers. Our experiments show that i) Smaller participants benefit from larger participants ii) Participants seeing different types of malicious traffic can generalize better to unseen types of attacks, increasing performance by 8% to 15% on average, and up to 27% in the extreme case. iii) Participating in Federated training never harms the performance of the locally trained model. In our experiment modeling a security-as-a service setting, Federated Learning increased detection up to 30% for some participants in the scheme. This clearly shows that Federated Learning is a viable approach to address issues of data sharing in common cyber security settings.
flexfringe
A Passive Automaton Learning Package
Reliable Machine Learning for Networking
Key Issues and Approaches