· · · Institutional Repository



Home · About · Disclaimer · Terms of use ·
   Options
 
Faculty:
Department:
Type:
Year:

Understanding and Improving the Performance Consistency of Distributed Computing Systems

Attachments

These file attachments have been under embargo and were made available to the public after the embargo was lifted on 13 November 2012.

Cite or link this publication as: doi:10.4233/uuid:555299fd-6918-4dfb-b883-5f642a97b324
Author: Yigitbasi, M.N.
Promotor: Epema, D.H.J.
Faculty:Electrical Engineering, Mathematics and Computer Science
Department:Software and Computer Technology
Type:Dissertation
Date:2012-12-04
Embargo lifted:2012-11-13
ISBN: 9789461860712
Keywords: grid computing · cloud computing · cluster computing · distributed systems · performance variability · performance consistency · scheduling · resource management · performance evaluation · failures
Rights: (c) 2012 Yigitbasi, M.N.

Abstract

With the increasing adoption of distributed systems in both academia and industry, and with the increasing computational and storage requirements of distributed applications, users inevitably demand more from these systems. Moreover, users also depend on these systems for latency and throughput sensitive applications, such as interactive perception applications and MapReduce applications, which make the performance of these systems even more important. Therefore, for the users it is very important that distributed systems provide consistent performance, that is, the system provides a similar level of performance at all times. In this thesis we address the problem of understanding and improving the performance consistency of state-of-the-art distributed computing systems. Towards this end, we take an empirical approach and we investigate various resource management, scheduling, and statistical modeling techniques with real system experiments in diverse distributed systems, such as clusters, multi-cluster grids, and clouds, using various types of workloads, such as Bags-of-tasks (BoTs), interactive perception applications, and scientific workloads.
In addition, as failures are known to be an important source of significant performance inconsistency, we also provide fundamental insights into the characteristics of failures in distributed systems, which is required to design systems that can mitigate the impact of failures on performance consistency.

Content Viewer