GP

G. Pellegrino

info

Please Note

5 records found

Conference paper (2017) - Nino Pellegrino, Qin Lin, Christian Hammerschmidt, Sicco Verwer
We present a novel way to detect infected hosts and identify malware in networks by analyzing network communication statistics with state-of-the-art automata learning algorithms. The automata encode patterns of short-term interactions in known malicious hosts, and are used to obtain small but effective fingerprints of machine behavior. We showcase the effectiveness of our system, named BASTA1 (Behavioral Analytics System using Timed Automata), on a public dataset containing Netflow traces of real-world botnet malware. Compared to a deep packet inspection of communication content, Netflows are easy and cheap to collect and analyze, and preserve a greater degree of privacy. Even though the high level of abstraction in Netflow data makes it more difficult to utilize it, BASTA shows very impressive results achieving high accuracy in several settings while returning few false positives. It is also capable of detecting infections of previously unseen malware. ...
Journal article (2017) - Thijs Veugen, Jeroen Doumen, Zekeriya Erkin, Nino Pellegrino, Sicco Verwer, Jos Weber
We consider dynamic group services, where outputs based on small samples of privacy-sensitive user inputs are repetitively computed. The leakage of user input data is analysed, caused by producing multiple outputs, resulting from inputs of frequently changing sets of users. A cryptographic technique, known as random user selection, is investigated. We show the effect of random user selection, given different types of output functions, thereby disproving earlier work. A new security measure is introduced, which provably improves the privacy-preserving effect of random user selection, irrespective of the output function. We show how this new security measure can be implemented in existing cryptographic protocols. To investigate the effectiveness of our security measure, we conducted a couple of statistical simulations with large user populations, which show that it forms a key ingredient, at least for the output function addition. Without it, an adversary is able to determine a user input, with increasing accuracy when more outputs become available. When the security measure is implemented, an adversary remains oblivious of user inputs, even when thousands of outputs are collected. Therefore, our new security measure assures that random user selection is an effective way of protecting the privacy of dynamic group services. ...
Conference paper (2016) - Christian Hammerschmidt, Samuel Marchal, Radu State, Nino Pellegrino, Sicco Verwer
The task of network traffic monitoring has evolved drastically with the ever-increasing amount of data flowing in large scale networks. The automated analysis of this tremendous source of information often comes with using simpler models on aggregated data (e.g. IP flow records) due to time and space constraints. A step towards utilizing IP flow records more effectively are stream learning techniques. We propose a method to collect a limited yet relevant amount of data in order to learn a class of complex models, finite state machines, in real-time. These machines are used as communication profiles to fingerprint, identify or classify hosts and services and offer high detection rates while requiring less training data and thus being faster to compute than simple models. ...
Conference paper (2016) - Nino Pellegrino, Christian Hammerschmidt, Sicco Verwer, Qin Lin
We proposes an algorithm to learn automata innite alphabets, or at least too large to enumerate. We apply it to dene a generic model intended for regression, with transitions constrained by intervals over the alphabet. The algorithm is based on the Red & Blue framework for learning from an input sample. We show two small case studies where the alphabets are respectively the natural and real numbers, and show how nice properties of automata models like interpretability and graphical representation transfer to regression where typical models are hard to interpret. ...