B.I. Ghit | TU Delft Repository

Optimizing the Performance of Data Analytics Frameworks

Doctoral thesis (2017) - B.I. Ghit (author)

Data analytics frameworks enable users to process large datasets while hiding the complexity of scaling out their computations on large clusters of thousands of machines. Such frameworks parallelize the computations, distribute the data, and tolerate server failures by deploying ...

Data analytics frameworks enable users to process large datasets while hiding the complexity of scaling out their computations on large clusters of thousands of machines. Such frameworks parallelize the computations, distribute the data, and tolerate server failures by deploying their own runtime systems and distributed filesystems on subsets of the datacenter resources. Most of the computations required by data analytics applications are conceptually straight-forward and can be performed through massive parallelization of jobs into many fine-grained tasks. Providing efficient and fault-tolerant execution of these tasks in datacenters is ever more challenging and a variety of opportunities for performance optimization still exist. In this thesis we optimize the job performance of data analytics frameworks by addressing several fundamental challenges that arise in datacenters. The first challenge is multi-tenancy: having a large number of users may require isolating their workloads across multiple frameworks. Nevertheless, achieving performance isolation is difficult, because different frameworks may deliver very unbalanced service levels to their users. Second, users have become very demanding from these frameworks, thus expecting timely results for jobs that require only limited resources. However, even with a few long jobs that consume large fractions of the datacenter resources, short jobs may be delayed significantly. Third, improving the job performance in the face of failures is harder still, as we need to allocate extra resources to recompute work which was already done. In order to address these challenges we design, implement, and test several scheduling policies for the evolving usage trends that are derived from the analysis of basic theoretical models. We take an experimental approach and we evaluate the performance of our policies with real-world experiments in a datacenter, using representative workloads and standard benchmarks. Furthermore, we bridge the gap between those experiments and prior theoretical work by performing large-scale simulations of scheduling policies.

Better Safe than Sorry

Grappling with Failures of In-Memory Data Analytics Frameworks

Conference paper (2017) - B.I. Ghit (author) , Dick H.J. Epema (author)

Providing fault-tolerance is of major importance for data analytics frameworks such as Hadoop and Spark, which are typically deployed in large clusters that are known to experience high failures rates. Unexpected events such as compute node failures are in particular an important ...

An Experimental Performance Evaluation of Autoscaling Policies for Complex Workflows

Conference paper (2017) - Alexey Ilyushkin (author) , Ahmed Ali-Eldin (author) , Nikolas Herbst (author) , Alessandro Papadopoulos (author) , B.I. Ghit (author) , Dick H.J. Epema (author) , Alexandru Iosup (author)

Simplifying the task of resource management and scheduling for customers, while still delivering complex Quality-of-Service (QoS), is key to cloud computing. Many autoscaling policies have been proposed in the past decade to decide on behalf of cloud customers when and how to pro ...

Tyrex

Size-Based Resource Allocation in MapReduce Frameworks

Conference paper (2016) - B.I. Ghit (author) , Dick H.J. Epema (author)

Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads ...

Which Cloud Auto-Scaler Should I Use for my Application?: Benchmarking Auto-Scaling Algorithms

Poster Paper

Conference paper (2016) - Ahmed Ali-Eldin (author) , Alexey Ilyushkin (author) , Bogdan Ghit (author) , Nikolas Herbst (author) , Alessandro Papadopoulos (author) , A Iosup (author)

Rapid elasticity is one of the essential characteristics of cloud computing identified by NIST. Elasticity allows resources to be provisioned and released to scale rapidly out ward and in ward according to demand. Tens -- if not hundreds -- of algorithms have been proposed in the ...

Reducing Job Slowdown Variability for Data-Intensive Workloads

Conference paper (2015) - B.I. Ghit (author) , Dick H.J. Epema (author)

A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is that small jobs with processing requirements counted in the minutes may suffer from the presence of huge jobs requiring hours or days of compute time, leading to a job slowdown distr ...

Scheduling Workloads of Workflows with Unknown Task Runtimes

Conference paper (2015) - Alexey Ilyushkin (author) , B.I. Ghit (author) , Dick H.J. Epema (author)

Workflows are important computational tools in many branches of science, and because of the dependencies among their tasks and their widely different characteristics, scheduling them is a difficult problem. Most research on scheduling workflows has focused on the offline problem ...

KOALA-C: A task allocator for integrated multicluster and multicloud environments

Conference paper (2014) - L. Fei (author) , Bogdan Ghit (author) , A Iosup (author) , DHJ Epema (author)

V for Vicissitude: The Cahllenge of scaling Complex Big Data Workflows

Conference paper (2014) - B.I. Ghit (author) , M Capotă (author) , Tim Hegeman (author) , AJH Hidders (author) , Dick H.J. Epema (author) , A Iosup (author)

Towards an optimized big data processing system

Conference paper (2013) - Bogdan Ghit (author) , A Iosup (author) , DHJ Epema (author)

Demonstrating BooSTER: The broadcast stream transmission epidemic repair

Conference paper (2012) - B.I. Ghit (author) , S. Voulgaris (author) , A Harwood (author)

Resource Management for Dynamic MapReduce Clusters in Multicluster Systems

Conference paper (2012) - Bogdan Ghit (author) , M.N. Yigitbasi (author) , D.H.J. Epema (author)