BG
B.I. Ghit
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
8 records found
1
Data analytics frameworks enable users to process large datasets while hiding the complexity of scaling out their computations on large clusters of thousands of machines. Such frameworks parallelize the computations, distribute the data, and tolerate server failures by deploying their own runtime systems and distributed filesystems on subsets of the datacenter resources. Most of the computations required by data analytics applications are conceptually straight-forward and can be performed through massive parallelization of jobs into many fine-grained tasks. Providing efficient and fault-tolerant execution of these tasks in datacenters is ever more challenging and a variety of opportunities for performance optimization still exist. In this thesis we optimize the job performance of data analytics frameworks by addressing several fundamental challenges that arise in datacenters. The first challenge is multi-tenancy: having a large number of users may require isolating their workloads across multiple frameworks. Nevertheless, achieving performance isolation is difficult, because different frameworks may deliver very unbalanced service levels to their users. Second, users have become very demanding from these frameworks, thus expecting timely results for jobs that require only limited resources. However, even with a few long jobs that consume large fractions of the datacenter resources, short jobs may be delayed significantly. Third, improving the job performance in the face of failures is harder still, as we need to allocate extra resources to recompute work which was already done. In order to address these challenges we design, implement, and test several scheduling policies for the evolving usage trends that are derived from the analysis of basic theoretical models. We take an experimental approach and we evaluate the performance of our policies with real-world experiments in a datacenter, using representative workloads and standard benchmarks. Furthermore, we bridge the gap between those experiments and prior theoretical work by performing large-scale simulations of scheduling policies.
...
Data analytics frameworks enable users to process large datasets while hiding the complexity of scaling out their computations on large clusters of thousands of machines. Such frameworks parallelize the computations, distribute the data, and tolerate server failures by deploying their own runtime systems and distributed filesystems on subsets of the datacenter resources. Most of the computations required by data analytics applications are conceptually straight-forward and can be performed through massive parallelization of jobs into many fine-grained tasks. Providing efficient and fault-tolerant execution of these tasks in datacenters is ever more challenging and a variety of opportunities for performance optimization still exist. In this thesis we optimize the job performance of data analytics frameworks by addressing several fundamental challenges that arise in datacenters. The first challenge is multi-tenancy: having a large number of users may require isolating their workloads across multiple frameworks. Nevertheless, achieving performance isolation is difficult, because different frameworks may deliver very unbalanced service levels to their users. Second, users have become very demanding from these frameworks, thus expecting timely results for jobs that require only limited resources. However, even with a few long jobs that consume large fractions of the datacenter resources, short jobs may be delayed significantly. Third, improving the job performance in the face of failures is harder still, as we need to allocate extra resources to recompute work which was already done. In order to address these challenges we design, implement, and test several scheduling policies for the evolving usage trends that are derived from the analysis of basic theoretical models. We take an experimental approach and we evaluate the performance of our policies with real-world experiments in a datacenter, using representative workloads and standard benchmarks. Furthermore, we bridge the gap between those experiments and prior theoretical work by performing large-scale simulations of scheduling policies.
Better Safe than Sorry
Grappling with Failures of In-Memory Data Analytics Frameworks
Providing fault-tolerance is of major importance for data analytics frameworks such as Hadoop and Spark, which are typically deployed in large clusters that are known to experience high failures rates. Unexpected events such as compute node failures are in particular an important challenge for in-memory data analytics frameworks, as the widely adopted approach to deal with them is to recompute work already done. Recomputing lost work, however, requires allocation of extra resource to re-execute tasks, thus increasing the job runtimes. To address this problem, we design a checkpointing system called panda that is tailored to the intrinsic characteristics of data analytics frameworks. In particular, panda employs fine-grained checkpointing at the level of task outputs and dynamically identifies tasks that are worthwhile to be checkpointed
rather than be recomputed. As has been abundantly shown, tasks of data analytics jobs may have very variable runtimes and output sizes. These properties form the basis of three checkpointing policies which we incorporate into panda. We first empirically evaluate panda on a multicluster system with single data analytics applications under space-correlated failures, and find that panda is close to the performance of a fail-free execution in unmodified Spark for a large range of concurrent failures. Then we perform simulations of complete workloads, mimicking the size and operation of a Google cluster, and show that panda provides significant improvements in the average job runtime for wide ranges of the failure rate and system load. ...
rather than be recomputed. As has been abundantly shown, tasks of data analytics jobs may have very variable runtimes and output sizes. These properties form the basis of three checkpointing policies which we incorporate into panda. We first empirically evaluate panda on a multicluster system with single data analytics applications under space-correlated failures, and find that panda is close to the performance of a fail-free execution in unmodified Spark for a large range of concurrent failures. Then we perform simulations of complete workloads, mimicking the size and operation of a Google cluster, and show that panda provides significant improvements in the average job runtime for wide ranges of the failure rate and system load. ...
Providing fault-tolerance is of major importance for data analytics frameworks such as Hadoop and Spark, which are typically deployed in large clusters that are known to experience high failures rates. Unexpected events such as compute node failures are in particular an important challenge for in-memory data analytics frameworks, as the widely adopted approach to deal with them is to recompute work already done. Recomputing lost work, however, requires allocation of extra resource to re-execute tasks, thus increasing the job runtimes. To address this problem, we design a checkpointing system called panda that is tailored to the intrinsic characteristics of data analytics frameworks. In particular, panda employs fine-grained checkpointing at the level of task outputs and dynamically identifies tasks that are worthwhile to be checkpointed
rather than be recomputed. As has been abundantly shown, tasks of data analytics jobs may have very variable runtimes and output sizes. These properties form the basis of three checkpointing policies which we incorporate into panda. We first empirically evaluate panda on a multicluster system with single data analytics applications under space-correlated failures, and find that panda is close to the performance of a fail-free execution in unmodified Spark for a large range of concurrent failures. Then we perform simulations of complete workloads, mimicking the size and operation of a Google cluster, and show that panda provides significant improvements in the average job runtime for wide ranges of the failure rate and system load.
rather than be recomputed. As has been abundantly shown, tasks of data analytics jobs may have very variable runtimes and output sizes. These properties form the basis of three checkpointing policies which we incorporate into panda. We first empirically evaluate panda on a multicluster system with single data analytics applications under space-correlated failures, and find that panda is close to the performance of a fail-free execution in unmodified Spark for a large range of concurrent failures. Then we perform simulations of complete workloads, mimicking the size and operation of a Google cluster, and show that panda provides significant improvements in the average job runtime for wide ranges of the failure rate and system load.
Conference paper
(2017)
-
Alexey Ilyushkin, Ahmed Ali-Eldin, Nikolas Herbst, Alessandro Papadopoulos, Bogdan Ghit, Dick Epema, Alexandru Iosup
Simplifying the task of resource management and scheduling for customers, while still delivering complex Quality-of-Service (QoS), is key to cloud computing. Many autoscaling policies have been proposed in the past decade to decide on behalf of cloud customers when and how to provision resources to a cloud application utilizing cloud elasticity features. However, in prior work, when a new policy is proposed, it is seldom compared to the state-of-the-art, and is
often compared only to static provisioning using a predefined QoS target. This reduces the ability of cloud customers and of cloud operators to choose and deploy an autoscaling policy. In our work, we conduct an experimental performance evaluation of autoscaling policies, using as application model workflows, a commonly used formalism for automating resource management for applications with well-defined yet complex structure. We present a detailed
comparative study of general state-of-the-art autoscaling policies, along with two new workflow-specific policies. To understand the performance differences between the 7 policies, we conduct various forms of pairwise and group comparisons. We report both individual and aggregated metrics. Our results highlight the trade-offs between the suggested policies, and thus enable a better understanding of the current state-of-the-art. ...
often compared only to static provisioning using a predefined QoS target. This reduces the ability of cloud customers and of cloud operators to choose and deploy an autoscaling policy. In our work, we conduct an experimental performance evaluation of autoscaling policies, using as application model workflows, a commonly used formalism for automating resource management for applications with well-defined yet complex structure. We present a detailed
comparative study of general state-of-the-art autoscaling policies, along with two new workflow-specific policies. To understand the performance differences between the 7 policies, we conduct various forms of pairwise and group comparisons. We report both individual and aggregated metrics. Our results highlight the trade-offs between the suggested policies, and thus enable a better understanding of the current state-of-the-art. ...
Simplifying the task of resource management and scheduling for customers, while still delivering complex Quality-of-Service (QoS), is key to cloud computing. Many autoscaling policies have been proposed in the past decade to decide on behalf of cloud customers when and how to provision resources to a cloud application utilizing cloud elasticity features. However, in prior work, when a new policy is proposed, it is seldom compared to the state-of-the-art, and is
often compared only to static provisioning using a predefined QoS target. This reduces the ability of cloud customers and of cloud operators to choose and deploy an autoscaling policy. In our work, we conduct an experimental performance evaluation of autoscaling policies, using as application model workflows, a commonly used formalism for automating resource management for applications with well-defined yet complex structure. We present a detailed
comparative study of general state-of-the-art autoscaling policies, along with two new workflow-specific policies. To understand the performance differences between the 7 policies, we conduct various forms of pairwise and group comparisons. We report both individual and aggregated metrics. Our results highlight the trade-offs between the suggested policies, and thus enable a better understanding of the current state-of-the-art.
often compared only to static provisioning using a predefined QoS target. This reduces the ability of cloud customers and of cloud operators to choose and deploy an autoscaling policy. In our work, we conduct an experimental performance evaluation of autoscaling policies, using as application model workflows, a commonly used formalism for automating resource management for applications with well-defined yet complex structure. We present a detailed
comparative study of general state-of-the-art autoscaling policies, along with two new workflow-specific policies. To understand the performance differences between the 7 policies, we conduct various forms of pairwise and group comparisons. We report both individual and aggregated metrics. Our results highlight the trade-offs between the suggested policies, and thus enable a better understanding of the current state-of-the-art.
Tyrex
Size-Based Resource Allocation in MapReduce Frameworks
Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads consisting of jobs with heavy-tailed processing requirements. With such workloads, short jobs may experience slowdowns that are an order of magnitude larger than large jobs do, while the users may expect slowdowns that are more in proportion with the job sizes. To address this problem of large job slowdown variability in MapReduce frameworks, we design a scheduling system called TYREX that is inspired by the well-known TAGS task assignment policy in distributed-server systems. In particular, TYREX partitions the resources of a MapReduce framework, allowing any job running in any partition to read data stored on any machine, imposes runtime limits in the partitions, and successively executes parts of jobs in a work-conserving way in these partitions until they can run to completion. We develop a statistical model for dynamically setting the runtime limits that achieves near optimal job slowdown performance, and we empirically evaluate TYREX on a cluster system with workloads consisting of both synthetic and real-world benchmarks. We find that TYREX cuts in half the job slowdown variability while preserving the median job slowdown when compared to state-of-the-art MapReduce schedulers such as FIFO and FAIR. Furthermore, TYREX reduces the job slowdown at the 95th percentile by more than 50% when compared to FIFO and by 20-40% when compared to FAIR.
...
Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads consisting of jobs with heavy-tailed processing requirements. With such workloads, short jobs may experience slowdowns that are an order of magnitude larger than large jobs do, while the users may expect slowdowns that are more in proportion with the job sizes. To address this problem of large job slowdown variability in MapReduce frameworks, we design a scheduling system called TYREX that is inspired by the well-known TAGS task assignment policy in distributed-server systems. In particular, TYREX partitions the resources of a MapReduce framework, allowing any job running in any partition to read data stored on any machine, imposes runtime limits in the partitions, and successively executes parts of jobs in a work-conserving way in these partitions until they can run to completion. We develop a statistical model for dynamically setting the runtime limits that achieves near optimal job slowdown performance, and we empirically evaluate TYREX on a cluster system with workloads consisting of both synthetic and real-world benchmarks. We find that TYREX cuts in half the job slowdown variability while preserving the median job slowdown when compared to state-of-the-art MapReduce schedulers such as FIFO and FAIR. Furthermore, TYREX reduces the job slowdown at the 95th percentile by more than 50% when compared to FIFO and by 20-40% when compared to FAIR.
Conference paper
(2016)
-
Ahmed Ali-Eldin, Alexey Ilyushkin, Bogdan Ghit, Nikolas Herbst, Alessandro Papadopoulos, Alexandru Iosup
Rapid elasticity is one of the essential characteristics of cloud computing identified by NIST. Elasticity allows resources to be provisioned and released to scale rapidly out ward and in ward according to demand. Tens -- if not hundreds -- of algorithms have been proposed in the literature to automatically achieve elastic provisioning. These algorithms are typically referred to as elasticity algorithms, dynamic provisioning techniques or autoscalers. While trying to solve the same problem, sometimes with differing assumption, many of these algorithms are either compared to static provisioning or to a predefined QoS target, e.g., predefined response time target, with very little -- or no -- comparison to previously published work. This reduces the ability of an application owner or a cloud operator to choose and deploy a suitable algorithm from the literature. Many of these algorithms have been tested with one single -- real or synthetic -- workload in a specific use-case. While all published algorithms are shown to work in the specific use-case they were designed for with the, typically short, workloads tested with, it is seldom the case that the real scenarios will be any thing close to the test cases for which the algorithms are shown to work. Bursts occur in workloads occasionally. Workload dynamics change over time and the load-mix of an application significantly affects how provisioning should be done.
...
Rapid elasticity is one of the essential characteristics of cloud computing identified by NIST. Elasticity allows resources to be provisioned and released to scale rapidly out ward and in ward according to demand. Tens -- if not hundreds -- of algorithms have been proposed in the literature to automatically achieve elastic provisioning. These algorithms are typically referred to as elasticity algorithms, dynamic provisioning techniques or autoscalers. While trying to solve the same problem, sometimes with differing assumption, many of these algorithms are either compared to static provisioning or to a predefined QoS target, e.g., predefined response time target, with very little -- or no -- comparison to previously published work. This reduces the ability of an application owner or a cloud operator to choose and deploy a suitable algorithm from the literature. Many of these algorithms have been tested with one single -- real or synthetic -- workload in a specific use-case. While all published algorithms are shown to work in the specific use-case they were designed for with the, typically short, workloads tested with, it is seldom the case that the real scenarios will be any thing close to the test cases for which the algorithms are shown to work. Bursts occur in workloads occasionally. Workload dynamics change over time and the load-mix of an application significantly affects how provisioning should be done.
A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is that small jobs with processing requirements counted in the minutes may suffer from the presence of huge jobs requiring hours or days of compute time, leading to a job slowdown distribution that is very variable and that is uneven across jobs of different sizes. Previous solutions to this problem for sequential or rigid jobs in single-server and distributed-server systems include priority-based FeedBack Queueing (FBQ), and Task Assignment by Guessing Sizes (TAGS), which kills and restarts from scratch on another server jobs that exceed the local time limit. In this paper, we derive four scheduling policies that are rightful descendants of existing size-based scheduling disciplines (among which FBQ and TAGS) with appropriate adaptations to data-intensive frameworks. The two main mechanisms employed by these policies are partitioning the resources of the datacenter, and isolating jobs with different size ranges. We evaluate these policies by means of realistic simulations of representative MapReduce workloads from Facebook and show that under the best of these policies, the vast majority of short jobs in MapReduce workloads experience close to ideal job slowdowns even under high system loads (in the range of 0.7-0.9) while the slowdown of the very large jobs is not prohibitive. We validate our simulations by means of experiments on a real multicluster system, and we find that the job slowdown performance results obtained with both match remarkably well.
...
A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is that small jobs with processing requirements counted in the minutes may suffer from the presence of huge jobs requiring hours or days of compute time, leading to a job slowdown distribution that is very variable and that is uneven across jobs of different sizes. Previous solutions to this problem for sequential or rigid jobs in single-server and distributed-server systems include priority-based FeedBack Queueing (FBQ), and Task Assignment by Guessing Sizes (TAGS), which kills and restarts from scratch on another server jobs that exceed the local time limit. In this paper, we derive four scheduling policies that are rightful descendants of existing size-based scheduling disciplines (among which FBQ and TAGS) with appropriate adaptations to data-intensive frameworks. The two main mechanisms employed by these policies are partitioning the resources of the datacenter, and isolating jobs with different size ranges. We evaluate these policies by means of realistic simulations of representative MapReduce workloads from Facebook and show that under the best of these policies, the vast majority of short jobs in MapReduce workloads experience close to ideal job slowdowns even under high system loads (in the range of 0.7-0.9) while the slowdown of the very large jobs is not prohibitive. We validate our simulations by means of experiments on a real multicluster system, and we find that the job slowdown performance results obtained with both match remarkably well.
Workflows are important computational tools in many branches of science, and because of the dependencies among their tasks and their widely different characteristics, scheduling them is a difficult problem. Most research on scheduling workflows has focused on the offline problem of minimizing the makespan of single workflows with known task runtimes. The problem of scheduling multiple workflows has been addressed either in an offline fashion, or still with the assumption of known task runtimes. In this paper, we study the problem of scheduling workloads consisting of an arrival stream of workflows without task runtime estimates. The resource requirements of a workflow can significantly fluctuate during its execution. Thus, we present four scheduling policies for workloads of workflows with as their main feature the extent to which they reserve processors to workflows to deal with these fluctuations. We perform simulations with realistic synthetic workloads and we show that any form of processor reservation only decreases the overall system performance and that a greedy backfilling-like policy performs best.
...
Workflows are important computational tools in many branches of science, and because of the dependencies among their tasks and their widely different characteristics, scheduling them is a difficult problem. Most research on scheduling workflows has focused on the offline problem of minimizing the makespan of single workflows with known task runtimes. The problem of scheduling multiple workflows has been addressed either in an offline fashion, or still with the assumption of known task runtimes. In this paper, we study the problem of scheduling workloads consisting of an arrival stream of workflows without task runtime estimates. The resource requirements of a workflow can significantly fluctuate during its execution. Thus, we present four scheduling policies for workloads of workflows with as their main feature the extent to which they reserve processors to workflows to deal with these fluctuations. We perform simulations with realistic synthetic workloads and we show that any form of processor reservation only decreases the overall system performance and that a greedy backfilling-like policy performs best.
Conference paper
(2014)
-
Bogdan Ghit, Mihai Capota, Tim Hegeman, Jan Hidders, Dick Epema, Alexandru Iosup