Catching the response time tail in the cloud

Conference Paper (2015)
Author(s)

Sebastiano Spicuglia (University of Lugano)

Mathias Bjorkqvist (Zurich Lab)

Lydia Y. Chen (IBM Research - Zurich, Zurich Lab)

Walter Binder (University of Lugano)

Affiliation
External organisation
DOI related publication
https://doi.org/10.1109/INM.2015.7140339
More Info
expand_more
Publication Year
2015
Language
English
Affiliation
External organisation
Pages (from-to)
572-577
ISBN (electronic)
9783901882760

Abstract

As modern service systems are pressured to provide competitive prices via cost-effective capacity planning, especially in the paradigm of cloud computing, service level agreements (SLAs) end up becoming ever more sophisticated, i.e., fulfilling targets of different percentiles of response times. However, it is no mean feat to predict even the average response times of real systems, or even abstracted queueing systems that typically simplify system details, and it gets even more complicated when trying to manage SLAs defined by various percentiles of response times. To efficiently capture these different percentiles, we first develop a novel and autonomic methodology - termed Burst Based Simulation, which combines burst profiling on real systems with complex, state-dependent simulations. Moreover, based on our methodology, we construct an analysis on SLA management: the prediction of SLA violations given a certain request pattern. We evaluate our approach on two types of service systems, virtualized and bare-metal, with wide ranges of SLAs and traffic loads. Our evaluation results show that our methodology is able to achieve an average error below 15% when predicting different response time percentiles, and accurately capture SLA violations.

No files available

Metadata only record. There are no files for this record.