G. Andreadis | TU Delft Repository

Capelin: Fast Data-Driven Capacity Planning for Cloud Datacenters

Master thesis (2020) - G. Andreadis, A. Iosup, V.S. van Beek, D.H.J. Epema, G. Gousios, Z. Erkin

Cloud datacenters provide a backbone to our digital society. Crucial to meeting increasing demand while maintaining efficient operation is the activity of capacity planning. Inaccurate capacity planning for cloud datacenters can lead to significant performance degradation, denser targets for failure, and unsustainable energy consumption. Although this activity is core to improving cloud infrastructure, relatively few comprehensive approaches and support tools exist, leaving many planners with merely rule-of-thumb judgement.

We propose Capelin, a data-driven, scenario-based capacity planning system for cloud datacenters. We design Capelin to address requirements we have derived from a unique survey of experts in charge of diverse datacenters in several countries. Capelin introduces the notion of portfolios of scenarios, which it leverages in its probing for alternative capacity-plans. At the core of the system, a trace-based, discrete-event simulator enables the exploration of different possible topologies, with support for scaling the volume, variety, and velocity of resources, and for horizontal (scale-out) and vertical (scale-up) scaling. The approach centers around a notion of portfolios of scenarios as a framework for probing alternative decisions and courses of events. Capelin gives detailed quantitative operational information for each scenario, which could facilitate human decisions in capacity planning.

We implement and open-source Capelin, and show through comprehensive trace-based experiments it can aid practitioners. Although Capelin is designed to work across many kinds of datacenters, in this work we focus on private-cloud, business-critical workloads, and on public-cloud operations. The results give evidence that choices that seem reasonable and common in practice could be worse by a factor of 1.5-2.0 than the best, in terms of performance degradation or energy consumption. We also show evidence of Capelin identifying meaningful choices that are different from the baseline proposed by a team of professional datacenter engineers. We open-source Capelin and release data artifacts for public inspection and reuse. ...

Cloud datacenters provide a backbone to our digital society. Crucial to meeting increasing demand while maintaining efficient operation is the activity of capacity planning. Inaccurate capacity planning for cloud datacenters can lead to significant performance degradation, denser targets for failure, and unsustainable energy consumption. Although this activity is core to improving cloud infrastructure, relatively few comprehensive approaches and support tools exist, leaving many planners with merely rule-of-thumb judgement.

We propose Capelin, a data-driven, scenario-based capacity planning system for cloud datacenters. We design Capelin to address requirements we have derived from a unique survey of experts in charge of diverse datacenters in several countries. Capelin introduces the notion of portfolios of scenarios, which it leverages in its probing for alternative capacity-plans. At the core of the system, a trace-based, discrete-event simulator enables the exploration of different possible topologies, with support for scaling the volume, variety, and velocity of resources, and for horizontal (scale-out) and vertical (scale-up) scaling. The approach centers around a notion of portfolios of scenarios as a framework for probing alternative decisions and courses of events. Capelin gives detailed quantitative operational information for each scenario, which could facilitate human decisions in capacity planning.

We implement and open-source Capelin, and show through comprehensive trace-based experiments it can aid practitioners. Although Capelin is designed to work across many kinds of datacenters, in this work we focus on private-cloud, business-critical workloads, and on public-cloud operations. The results give evidence that choices that seem reasonable and common in practice could be worse by a factor of 1.5-2.0 than the best, in terms of performance degradation or energy consumption. We also show evidence of Capelin identifying meaningful choices that are different from the baseline proposed by a team of professional datacenter engineers. We open-source Capelin and release data artifacts for public inspection and reuse.

A Systematic Design Space Exploration of Datacenter Schedulers

Bachelor thesis (2019) - Fabian Mastenbroek, Georgios Andreadis, Alexandru Iosup

Datacenter infrastructure has become vital for stakeholders across industry, academia and government. To operate efficiently, datacenter operators rely on a variety of complex scheduling techniques, to distribute user workloads across resources. In this work, we leverage a reference architecture for datacenter scheduling to design and implement an instrument for systematic design space exploration of datacenter schedulers. We construct a formal representation of the design space for datacenter schedulers, using scheduling policies collected from real-world schedulers. We then use a genetic algorithm in combination with trace-based simulation to explore the space, optimizing for workload metrics. Through several experiments, we assess the viability of the instrument. We find that our instrument is able to identify patterns in the workloads and adapt the scheduling policies appropriately. Overall, our work leads to numerous findings, which can become valuable for future comprehension and development of schedulers. ...

A Reference Architecture for Datacenter Scheduling

Design, Validation, and Experiments

Conference paper (2019) - Georgios Andreadis, Laurens Versluis, Fabian Mastenbroek, Alexandru Iosup

Datacenters act as cloud-infrastructure to stakeholders across industry, government, and academia. To meet growing demand yet operate efficiently, datacenter operators employ increasingly more sophisticated scheduling systems, mechanisms, and policies. Although many scheduling techniques already exist, relatively little research has gone into the abstraction of the scheduling process itself, hampering design, tuning, and comparison of existing techniques. In this work, we propose a reference architecture for datacenter schedulers. The architecture follows five design principles: components with clearly distinct responsibilities, grouping of related components where possible, separation of mechanism from policy, scheduling as complex workflow, and hierarchical multi-scheduler structure. To demonstrate the validity of the reference architecture, we map to it state-of-the-art datacenter schedulers. We find scheduler-stages are commonly underspecified in peer-reviewed publications. Through trace-based simulation and real-world experiments, we show underspecification of scheduler-stages can lead to significant variations in performance. ...

Schaapi

Early detection of breaking changes based on API usage

Bachelor thesis (2018) - Joel Abrahams, Georgios Andreadis, Casper Boone, Florine Dekker, Maurício Finavaro Aniche, Asterios Katsifodimos

Library developers are often unaware of how their library is used exactly in practice. When a library developer changes the internals of a library, this may unintentionally affect or even break the working of the library users' code. While it is possible to detect when a syntactic breaking change occurs, it is not as easy to detect semantic breaking changes, where the implicit contract of a functionality changes, sometimes unbeknownst to the library developer. Because library users rarely test the behaviour they expect of the library, neither the library developer nor the library user will be aware of the new behaviour.

As a library developer, you want to be able to see how a change in your library will affect your users before a new version of the library is deployed. More specifically, you want to gain insight into how users use the library, and want to see if and how changes affect users. This will allow you to determine whether the new version of the library is backwards compatible. Finally, after deploying the breaking changes, you want to notify the affected users of the changes and of a solution to the issue.

Schaapi, a tool for early detection of breaking changes based on API usages, addresses these needs. It mines public repositories for projects using a given library, analyses their usage of the API of that library, and generates tests that capture this behaviour. Finally, it offers a continuous integration service that automatically executes these tests against new versions of the library and warns developers of any potentially breaking changes in functionality. The tool has also been validated against real-world data to demonstrate its performance in realistic usage scenarios and to answer a selection of related research questions. ...

Massivizing computer systems

A vision to understand, design, and engineer computer ecosystems through and beyond modern distributed systems

Conference paper (2018) - Alexandru Iosup, Alexandru Uta, Laurens Versluis, Georgios Andreadis, Erwin Van Eyk, Tim Hegeman, Sacheendra Talluri, Vincent Van Beek, Lucian Toader

Our society is digital: industry, science, governance, and individuals depend, often transparently, on the inter-operation of large numbers of distributed computer systems. Although the society takes them almost for granted, these computer ecosystems are not available for all, may not be affordable for long, and raise numerous other research challenges. Inspired by these challenges and by our experience with distributed computer systems, we envision Massivizing Computer Systems, a domain of computer science focusing on understanding, controlling, and evolving successfully such ecosystems. Beyond establishing and growing a body of knowledge about computer ecosystems and their constituent systems, the community in this domain should also aim to educate many about design and engineering for this domain, and all people about its principles. This is a call to the entire community: there is much to discover and achieve. ...