Understanding and Improving the Performance Consistency of Distributed Computing Systems

Doctoral thesis (2012)

Authors

M.N. Yigitbasi

Contributors

D.H.J. Epema (promotor)

Department

Software and Computer Technology () (TU Delft)

Cloud computing Scheduling Distributed systems Resource management Performance evaluation Grid computing Cluster computing Performance variability Performance consistency Failures

To reference this document use:

http://resolver.tudelft.nl/uuid:555299fd-6918-4dfb-b883-5f642a97b324

More Info

expand_more

Published Date

04-12-2012

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Software and Computer Technology

Abstract

With the increasing adoption of distributed systems in both academia and industry, and with the increasing computational and storage requirements of distributed applications, users inevitably demand more from these systems. Moreover, users also depend on these systems for latency and throughput sensitive applications, such as interactive perception applications and MapReduce applications, which make the performance of these systems even more important. Therefore, for the users it is very important that distributed systems provide consistent performance, that is, the system provides a similar level of performance at all times. In this thesis we address the problem of understanding and improving the performance consistency of state-of-the-art distributed computing systems. Towards this end, we take an empirical approach and we investigate various resource management, scheduling, and statistical modeling techniques with real system experiments in diverse distributed systems, such as clusters, multi-cluster grids, and clouds, using various types of workloads, such as Bags-of-tasks (BoTs), interactive perception applications, and scientific workloads. In addition, as failures are known to be an important source of significant performance inconsistency, we also provide fundamental insights into the characteristics of failures in distributed systems, which is required to design systems that can mitigate the impact of failures on performance consistency.

Files

Yigitbasi-20121204.pdf

(pdf | 4.08 Mb)