Learning state machines from data streams: A generic strategy and an improved heuristic

None, None; None, None

Learning state machines from data streams: A generic strategy and an improved heuristic

Conference Paper (2023)

Author(s)

R. Baumgartner (TU Delft - Cyber Security)

Sicco Verwer (TU Delft - Cyber Security)

Research Group

Cyber Security

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:973e77eb-89b4-4568-bcb2-645b38503258

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Research Group

Cyber Security

Volume number

217

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

State machines models are models that simulate the behavior of discrete event systems, capable of representing systems such as software systems, network interactions, and control systems, and have been researched extensively. The nature of most learning algorithms however is the assumption that all data be available at the begining of the algorithm, and little research has been done in learning state machines from streaming data. In this paper, we want to close this gap further by presenting a generic method for learning state machines from data streams, as well as a merge heuristic that uses sketches to account for incomplete prefix trees. We implement our approach in an open-source state merging library and compare it with existing methods. We show the effectiveness of our approach with respect to run-time, memory consumption, and quality of results on a well known open dataset.State machines models are models that simulate the behavior of discrete event systems, capable of representing systems such as software systems, network interactions, and control systems, and have been researched extensively. The nature of most learning algorithms however is the assumption that all data be available at the begining of the algorithm, and little research has been done in learning state machines from streaming data. In this paper, we want to close this gap further by presenting a generic method for learning state machines from data streams, as well as a merge heuristic that uses sketches to account for incomplete prefix trees. We implement our approach in an open-source state merging library and compare it with existing methods. We show the effectiveness of our approach with respect to run-time, memory consumption, and quality of results on a well known open dataset.

Files

Baumgartner23a.pdf

(pdf | 0.831 Mb)