Graph-aware anomalous network agent detection

More Info
expand_more

Abstract

Networks with a large number of participants and a highly dynamic data exchange are better off using a distributed networking system due to network failures in centralized networks. However, with the increase in distributed networking, security problems arise in distributed processes. Injection of malicious data, for example, must be dealt with by using the tools provided by detection theory. The detection probability pd can be taken as the performance metric that we aim to optimize. In order to achieve this, one must first define a hypothesis testing problem and derive an optimization problem for pd that is dependent on which nodes are assumed to be compromised by these malicious agents that inject data.

For the injected data, there are three different models taken into consideration for the change in values over time and nodes. For every model, we assume that the outlier data can be injected with two different attack modes. These attack modes enforce different network topology related constraints on the set of compromised nodes due to different motivations. Furthermore, additional constraints are assumed due to the limited resources of the agents.
Additionally, with the given framework, we can also derive an optimization problem that can be solved with the help of the well-known linear regression method, i.e., Lasso. The problem that arise with this method is the difficulty of implementing the network topology related constraints into this optimization problem.

In order to solve these optimization problems, several methods are combined for the relaxation and solution of the optimization problems.

From numerical evaluation, it can be observed that our empirical performance is non-negligibly lower than the theoretical performance for all three models and both attacking modes. This can be linked to the dependence of the empirical distribution to the derived subset of compromised nodes, these subsets are chosen such that the cost function is optimized. Hence, we observe that the empirical values are much higher than it is theoretically assumed, in case that the network is 'clean'.
A second factor for performance evaluation is the number of wrongly indexed nodes, it can be observed how this factor is dependent on the distribution of the energy of the outlier data over the nodes and time.

Overall, this study shows that the provided framework shows an increasing performance for an increasing outlier-to-noise energy ratio. For an energy ratio higher than 0.5, the empirical and theoretical ROC-curves are nearly perfectly saturated for all models. The number of wrongly indexed nodes for our methods is generally speaking lower compared with the Lasso-method.