SAGE is an unsupervised sequence learning pipeline that generates alert-driven attack graphs (AGs) without the need for prior expert knowledge about existing vulnerabilities and network topology. Using a suffix-based probabilistic deterministic finite automaton (S-PDFA), it accen
...
SAGE is an unsupervised sequence learning pipeline that generates alert-driven attack graphs (AGs) without the need for prior expert knowledge about existing vulnerabilities and network topology. Using a suffix-based probabilistic deterministic finite automaton (S-PDFA), it accentuates infrequent high-severity alerts without discarding frequent low-severity alerts. It also captures the context of the alerts with identical signatures and it is an interpretable model. In order to deal with infrequent data, SAGE utilises sink states which are not merged during the S-PDFA learning process. However, this could result in unnecessarily larger AGs. In this study, we have looked at the AGs resulting from merging sink states with other sinks and the core of the S-PDFA after the main merging process. Data from Collegiate Penetration Testing Competitions has been used to compare AGs based on the four metrics: size, complexity, interpretability and completeness. We have shown that the resulting graphs are, on average, slightly smaller, with about the same complexity and the same completeness, but with worse interpretability due to losses of context of attack episodes, which cannot be compensated by the slightly smaller size of the AGs.