Capelin: Fast Data-Driven Capacity Planning for Cloud Datacenters
More Info
expand_more
Abstract
Cloud datacenters provide a backbone to our digital society. Crucial to meeting increasing demand while maintaining efficient operation is the activity of capacity planning. Inaccurate capacity planning for cloud datacenters can lead to significant performance degradation, denser targets for failure, and unsustainable energy consumption. Although this activity is core to improving cloud infrastructure, relatively few comprehensive approaches and support tools exist, leaving many planners with merely rule-of-thumb judgement.
We propose Capelin, a data-driven, scenario-based capacity planning system for cloud datacenters. We design Capelin to address requirements we have derived from a unique survey of experts in charge of diverse datacenters in several countries. Capelin introduces the notion of portfolios of scenarios, which it leverages in its probing for alternative capacity-plans. At the core of the system, a trace-based, discrete-event simulator enables the exploration of different possible topologies, with support for scaling the volume, variety, and velocity of resources, and for horizontal (scale-out) and vertical (scale-up) scaling. The approach centers around a notion of portfolios of scenarios as a framework for probing alternative decisions and courses of events. Capelin gives detailed quantitative operational information for each scenario, which could facilitate human decisions in capacity planning.
We implement and open-source Capelin, and show through comprehensive trace-based experiments it can aid practitioners. Although Capelin is designed to work across many kinds of datacenters, in this work we focus on private-cloud, business-critical workloads, and on public-cloud operations. The results give evidence that choices that seem reasonable and common in practice could be worse by a factor of 1.5-2.0 than the best, in terms of performance degradation or energy consumption. We also show evidence of Capelin identifying meaningful choices that are different from the baseline proposed by a team of professional datacenter engineers. We open-source Capelin and release data artifacts for public inspection and reuse.