Identifying Informative System Metrics: Predicting the predictability of time series using entropy

More Info
expand_more

Abstract

Building predictive models using cloud metrics for a task like incident prediction in the cloud is becoming ubiquitous in cloud monitoring. For such a forecasting task, if we know beforehand which system metrics are predictable then we can easily build good models. Quantifying the predictability of cloud metrics can help us rank the available system metrics and select a subset of cloud metrics with the lowest complexity. Moreover, storing informative metrics for a longer period can result in better forecasting. This thesis presents a novel entropy method for quantifying the complexity of time series: Reverse weighted Dispersion Entropy (RWDE). We also present an exploratory study to understand and quantify the complexity of cloud metrics. This exploratory case study has been carried out at ING, a large banking company with in-house cloud architecture. We perform simulation experiments on simulated signals to compare RWDE with other entropy methods. We apply RWDE on cloud metric data from ING to approximate the predictability of these cloud metrics. The experimental results show that RWDE has better performance than other entropy methods and can be used to select informative cloud metrics for a forecasting task. Further, we establish a relationship between RWDE and model-based predictability of cloud metrics. For each cloud metric, we compare RWDE with predictions from various forecasting models. Our results show that this relationship can be used as a heuristic by practitioners to identify unsuitable forecasting models for certain cloud metrics. We make RWDE and other entropy methods discussed in this study available as an open-source Python package.