| 1 |
|
Application-Oriented Scheduling in Multicluster Grids
Grid computing appeared in the mid 1990s with the vision of sharing geographically dispersed large computational resources for executing computation-intensive scientific applications. Today, we can name numerous grid projects that run successfully to solve challenging scientific problems such as the grid project of European Organization for Nuclear Research (CERN), which combines thousands of computers worldwide (over 200 sites in about 30 countries) to store and analyze huge amounts of data, which are produced by the Large Hadron Collider (LHC) at CERN.
The resources in a grid system are typically heterogeneous since they belong to different administrative domains, and they are managed by proprietary policies. To cope with this heterogeneity, a grid relies on a layer of middleware, which offers transparent access to the distributed resources and simplifies the collaboration between organizations.
Grids also need high-level scheduling systems that use grid middleware in order to map application tasks to resources and then manage their execution on behalf of users. However, scheduling in grids is challenging due to the dynamic nature of the grid resources as well as to the lack of control of those resources. The wide variety in the structural and the communication characteristics of the applications submitted to grids further complicate grid scheduling, and may lead to poor or unpredictable performance unless these characteristics are taken into account.
In this thesis we address the challenge of designing and analyzing realistic and practical application-oriented scheduling mechanisms in multicluster grid systems. Application-oriented scheduling focuses on the optimization of user-centric performance criteria, such as application execution time, with methods that are specialized for different types of applications. In this thesis we cover a wide-range of grid application types, including parallel applications that may need co-allocation or malleability, bags-of-tasks that can benefit from cycle scavenging, and workflow applications that may push the system to its limits with their computation and data requirements. We investigate the performance of our scheduling mechanisms and policies in a real multicluster grid system, the DAS, using our KOALA multicluster grid scheduler, as well as with simulations using realistic scenarios.
|
[PDF]
[Abstract]
|
| 2 |
|
Trace-based performance analysis of scheduling bags of tasks in grids
Grid computing promises large scale computing facilities based on distributed systems. Much research has been done on the subject of increasing the performance of grids. We believe that an adequate performance analysis of grids requires knowledge
of the workload and the architecture of the grid. Currently, researchers assume that grids are similar to other distributed systems, such as massively parallel computers. However, workloads in grids differ from other distributed systems, because
they consist for a significant part of bags-of-tasks. This research presents a method to model the workload of grids realistically, which enables us to analyze the performance of those systems. We have created a flexible workload model that is specifically tailed for grids. The model explicitly handles bag-of-tasks, which comprise the majority of grid workloads. This workload model has been built using a vast amount of workload trace data from seven real-world grids. The workload model enables us to conduct a performance analysis in which we analyze the impact of several workload characteristics, task selection and scheduling policies, and resource management architectures on system performance. We use simulations to systematically and realistically investigate the system performance in various scenarios. This research has resulted in a grid performance analysis toolbox, a software package that allows researchers to analyze, model, and generate workloads of grids. In addition, we have contributed trace data and analysis to the community by means of the Grid Workloads Archive, an on-line archive of trace data analyzed with our toolbox.
|
[PDF]
[Abstract]
|
| 3 |
|
High-performance Processing in Networked and Grid Environments
In this dissertation, we present several techniques to achieve the high-performance processing in networked and grid environments. Many applications need a high-performance processing system to execute efficiently.
High-performance processing mainly stems from parallelism. The parallel nature of grid computing is a very attractive solution to exploit the mentioned parallelism by executing either different parts of an application or several applications in parallel. In a grid system, the most important resources are computing and communication resources. The computing resources are the processors in the nodes on the grid. Communication within the grid is important for distributing tasks and their required data to the nodes within the grid.
We propose an innovative high-performance platform to utilize reconfigurable processors in grid environments. Furthermore, we focus on the communication infrastructures and network processing (processing required for packets) platforms to utilize them through the grid environments. The collaboration of reconfigurable processors in a grid environment is presented and several compute-intensive multimedia kernels are simulated. Subsequently, we introduce three approaches to accelerate network processing tasks using Bloom filters in the networked and grid environments. The first and second techniques present a cache architecture for a counting Bloom filter (CCBF) and a memory optimization approach for Bloom filters using an additional hashing function (BFAH). The third technique proposes a power efficient pipelined Bloom filter.
We present the results of our proposed approaches in collaboration of reconfigurable processors in grid computing (CRGC) and Bloom filters in network processing applications, e.g., packet classification. The experimental results show that the CRGC approach improves performance up to 7.2x and 2.5x compared to a GPP and the collaboration of GPPs, respectively. The results of the CCBF and BFAH for packet classification show that the proposed techniques decrease the number of memory accesses when compared to a standard Bloom filter.
|
[PDF]
[Abstract]
|
| 4 |
|
Quantifying Pathology in Diffusion Weighted MRI
In this thesis algorithms are proposed for quantification of pathology in Diffusion Weighted Magnetic Resonance Imaging (DW-MRI) data. Functional evidence for brain diseases can be explained by specific structural loss in the white matter of the brain. That is, certain biomarkers may exist where the disease inhibits improper functioning. Axonal and myelin sheath damage hamper neural connectivity. This can be assessed in vivo by measuring a change in the diffusion of water molecules. DW-MRI may deliver such biomarkers by capturing subtle changes in the diffusion process in an early disease stage. Diffusion tensor derived scalar measures such as the Mean Diffusivity and Fractional Anisotropy (FA) quantify this process. When comparing such measures on group level, patients may be found to significantly differ from healthy controls.
Conventional analysis treats measurements in each voxel independently. Due to high inter-brain connectivity, we hypothesize that multiple brain regions are involved in complex brain diseases such as schizophrenia. We introduce a new machine learning framework for performing such analysis in comparative studies on volumetric data. By ‘shaving' the mapping computed by discriminant analysis, a characteristic set of regions is automatically extracted. We opt to ‘prune' the dataset by iteratively discarding misclassified subjects from the cohort, such that the mapping is based merely on representative prototypes. Then a comparison is made between a linear and non-linear kernel discriminant analysis, to identify the dimensionality of the problem.
Methods are proposed for modeling the 30% of white matter that cannot be adequately described by a single tensor. These regions are currently disregarded in comparitive studies. More complex diffusion models that are introduced need to be adequately evaluated. We propose a new method for creating experimental phantom data of fiber crossings, by mixing the DWI-signals from high FA-regions. These models demand HARDI (High Angular Resolution Diffusion Imaging) data acquired with higher SNR, diffusion weighting and angular resolution. In comparitive studies, scanning time may be insufficient for meeting the requirements. We propose a method to create a dual tensor atlas from multiple coregistered non-HARDI datasets. The random fluctuations in the pose of subjects in the scanner as well as anatomical heterogeneity contribute to an increased angular resolution. Finally, we build an optimization framework for estimating both diffusion shape and orientation in fiber crossings. This work sets fundamental limits for comparative studies to correctly analyze crossing white matter structures. Firstly, it assesses the precision and accuracy with which parameters may be estimated. Secondly, the optimal acquisition parameters are selected in order to do so. This model allows for estimating consistent FA-profiles along crossing tracts.
Fiber tracts provide a specific frame of reference for computing statistics. We perform 3D tract-based comparison of tensor-derived indices between groups. Inter-subject correspondence is achieved by non-rigid registration based on a joint clustering and matching. The clustering delivers atlas points that serve as a frame of reference for performing the analysis. Spatial consistency is taken to reflect a significant difference between groups.
The potential of our methods is demonstrated in two comparative studies: on Childhood Cancer survivors and Amyotrophic Lateral Sclerosis respectively.
High throughput analysis of our data is realized by adopting a grid computing approach. The grid provides fast and easy access to shared resources. We present our progress over the past four years in the development and porting the DW-MRI analysis pipeline to grids. By doing do, our algorithms and results can also be accessed by fellow researchers.
|
[PDF]
[Abstract]
|
| 5 |
|
Fast solution of nonsymmetric linear systems on Grid computers using parallel variants of IDR(s)
IDR(s) is a family of fast algorithms for iteratively solving large nonsymmetric linear systems [14]. With cluster computing and in particular with Grid computing, the inner product is a bottleneck operation. In this paper, three techniques are combined in order to alleviate this bottleneck. Firstly, the efficient and stable IDR(s) algorithm from [16] is reformulated in such a way that it has a single global synchronisation point per iteration step. Secondly, the so–called test matrix is chosen so that
the work, communication, and storage involving this matrix is minimised in multi–cluster environments. Finally, a methodology is presented for a–priori estimation of the optimal value of s using only problem and machine–based parameters. Numerical experiments applied to a 3D convection–diffusion problem are performed on the DAS–3 Grid computer, demonstrating the effectiveness of these three techniques.
|
[PDF]
[Abstract]
|
| 6 |
|
Exploiting the flexibility of IDR(s) for grid computing
The IDR(s) method that is proposed in [26] is an efficient limited memory method for solving large nonsymmetric systems of linear equations. In [11] an IDR(s) variant is described that has a single synchronisation point per iteration step, which makes this variant well-suited for parallel and grid computing. In this paper, we combine this IDR(s) variant with an a-synchronous preconditioning iteration to further improve the performance of IDR(s) on a grid computer. A-synchronous preconditioners do not require expensive synchronisation and adapt to volatile computational resources, and are therefore well-suited for such a computational environment. However, an
a-synchronous preconditioning operation is also non-constant by nature: the preconditioner
changes in every iteration. The success of the combination of IDR(s) with an a-synchronous preconditioner therefore depends on the flexibility of IDR(s). We will explain why IDR(s) can be used as a flexible method, and we will successfully use the combination of IDR(s) with an a-synchronous preconditioner for solving large convection-diffusion problems. The numerical experiments are performed on the DAS-3 grid computer, which is composed of five geographically separated parallel
clusters.
|
[PDF]
[Abstract]
|
| 7 |
|
Understanding and Improving the Performance Consistency of Distributed Computing Systems
With the increasing adoption of distributed systems in both academia and industry, and with the increasing computational and storage requirements of distributed applications, users inevitably demand more from these systems. Moreover, users also depend on these systems for latency and throughput sensitive applications, such as interactive perception applications and MapReduce applications, which make the performance of these systems even more important. Therefore, for the users it is very important that distributed systems provide consistent performance, that is, the system provides a similar level of performance at all times. In this thesis we address the problem of understanding and improving the performance consistency of state-of-the-art distributed computing systems. Towards this end, we take an empirical approach and we investigate various resource management, scheduling, and statistical modeling techniques with real system experiments in diverse distributed systems, such as clusters, multi-cluster grids, and clouds, using various types of workloads, such as Bags-of-tasks (BoTs), interactive perception applications, and scientific workloads.
In addition, as failures are known to be an important source of significant performance inconsistency, we also provide fundamental insights into the characteristics of failures in distributed systems, which is required to design systems that can mitigate the impact of failures on performance consistency.
|
[PDF]
[Abstract]
|
| 8 |
|
A framework for the study of grid inter-operation mechanisms
The study of the history of computing infrastructures reveals an integration trend. For example, the explosive growth of the Internet in the 1990s was the result of an integration process started in the 1960s with the emerging networks of computers. By using the Internet, millions of users were capable of accessing information anytime and anywhere, much like other daily utilities such as water, electricity, and telephone. However, an important category of users remained under-served: the users with large computational and storage requirements, e.g., the scientists, the companies that focus on data analysis, and the governmental departments that manage the interaction between the state and the population (such as census, tax, and public health). Thus, in the mid-1990s, the vision of the (Computing) Grid as a universal computing utility was formulated. The main benefits promised by the Grid are similar to those of other integration efforts: extended and optimized service of the integrated network, and significant reductions of maintenance and operation costs through sharing and better scheduling. While the universal Grid has yet to be developed, large-scale distributed computing infrastructures that provide their users with seamless and secure access to computing resources, individually called Grid parts or simply grids, have been built throughout the world---in different countries, for different sciences, and both for production work and for computer-science research. At the same time, the main technological alternatives to grids, that is, supercomputers and large clusters, have evolved into much larger, scalable, and reliable systems. Thus, the integration of existing grids into larger infrastructures and finally into The Grid is key in keeping the grid vision attractive for its potential users. The integration of grids raises a double challenge, one related with the efficient scaling of a distributed computing system, the second associated with the operation of a system across different ownership and administrative domains. Thus, many of the traditional approaches for inter-operating computer systems, such as those based on completely centralized or purely decentralized system architectures, are eliminated from the start. To mark the distinction between the typical problem of integrating smaller components into a larger system and the double challenge of grid integration, we call the latter the problem of grid inter-operation. In this thesis we approach the problem of grid inter-operation with two main objectives: to design a comprehensive framework for the study of grid inter-operation mechanisms, and to provide an initial but good solution for this problem. We design a framework for the study of grid inter-operation that includes a toolbox for grid inter-operation research and a method for the study of grid inter-operation mechanisms. In the research toolbox we include the Grid Workloads Archive (GWA), a comprehensive model for grid resources and workloads, the GrenchMark performance evaluation framework, and the Delft Grid Simulation (DGSim) framework for repeated and realistic simulations of multi-cluster and multi-grid environments. The GWA and our comprehensive model show that grid computing is mostly used in practice for single-processor jobs and not for parallel computing, which raises previously ignored challenges related to the volume of jobs to be managed. We also devise in this thesis a method for studying grid inter-operation mechanisms. We answer using our framework important questions regarding existing grid operation mechanisms, and in particular show that these mechanisms are too limited to cope with real and realistic conditions. We further demonstrate the usefulness of our framework by designing Delegated MatchMaking, a novel mechanism for inter-operating grids. This mechanism is used to operate an architecture that is a hybrid between hierarchical and purely decentralized architectures. The Delegated Matchmaking mechanism attempts to use the local resources of a grid as much as possible and also transparently extends the local environment with resources obtained (delegated) from other sites when resources are not available locally. Our approach is compared with five alternatives through trace-based simulations, and is found to deliver the best performance, especially when the system is heavily loaded. While many other mechanisms can be designed in the future, our experiments prove that the Delegated MatchMaking approach already is a good solution for the problem of grid inter-operation. Our experiments also demonstrate that having grids inter-operate leads to better performance than having the same grids operate independently.
|
[PDF]
[Abstract]
|
| 9 |
|
Efficient Iterative Solution of Large Linear Systems on Heterogeneous Computing Systems
This dissertation deals mainly with the design, implementation, and analysis of efficient iterative
solution methods for large sparse linear systems on distributed and heterogeneous computing
systems as found in Grid computing.
First, a case study is performed on iteratively solving large symmetric linear systems on
both a multi–cluster and a local heterogeneous cluster using standard block Jacobi precondi-
tioning within the software constraints of standard Grid middleware and within the algorith-
mic constraints of preconditioned Conjugate Gradient–type methods. This shows that global
synchronisation is a key bottleneck operation and in order to alleviate this bottleneck, three
main strategies are proposed: exploiting the hierarchical structure of multi–clusters, using asyn-
chronous iterative methods as preconditioners, and minimising the number of inner products in
Krylov subspace methods.
Asynchronous iterative methods have never really been successfully applied to the solution
of extremely large sparse linear systems. The main reason is that the slow convergence rates
limit the applicability of these methods. Nevertheless, the lack of global synchronisation points
in these methods is a highly favourable property in heterogeneous computing systems. Krylov
subspace methods offer significantly improved convergence rates, but the global synchronisation
points induced by the inner product operations in each iteration step limits the applicability.
By using an asynchronous iterative method as a preconditioner in a flexible Krylov subspace
method, the best of both worlds is combined. It is shown that this hybrid combination of a
slow but asynchronous inner iteration and a fast but synchronous outer iteration results in high
convergence rates on heterogeneous networks of computers. Since the preconditionering iteration
is performed on heterogeneous computing hardware, it varies in each iteration step. Therefore,
a flexible iterative method which can handle a varying preconditioner has to be employed. This
partially asynchronous algorithm is implemented on two different types of Grid hardware applied
to two different applications using two different types of Grid middleware.
The IDR(s) method and its variants are new and powerful algorithms for iteratively solving
large nonsymmetric linear systems. Four techniques are used to construct an efficient IDR(s)
variant for parallel computing and in particular for Grid computing. Firstly, an efficient and
robust IDR(s) variant is presented that has a single global synchronisation point per matrix–
vector multiplication step. Secondly, the so–called IDR test matrix in IDR(s) can be chosen
freely and this matrix is constructed such that the work, communication, and storage involving
this matrix are minimised in the context of multi–clusters. Thirdly, a methodology is presented
for a priori estimation of the optimal value of s in IDR(s). Finally, the proposed IDR(s) variant
is combined with an asynchronous preconditioning iteration.
By using an asynchronous preconditioner in IDR(s), the IDR(s) method is treated as a flexible
method, where the preconditioner changes in each iteration step. In order to begin analysing
mathematically the effect of a varying preconditioning operator on the convergence properties
of IDR(s), the IDR(s) method is interpreted as a special type of deflation method. This leads
to a better understanding of the core structure of IDR(s) methods. In particular, it provides an
intuitive explanation for the excellent convergence properties of IDR(s).
Two applications from computational fluid dynamics are considered: large bubbly flow prob-
lems and large (more general) convection–diffussion problems, both in 2D and 3D. However, the
techniques presented can be applied to a wide range of scientific applications.
Large numerical experiments are performed on two heterogeneous computing platforms: (i)
local networks of non–dedicated computers and (ii) a dedicated cluster of clusters linked by a
high–speed network. The numerical experiments not only demonstrate the effectiveness of the
techniques, but they also illustrate the theoretical results.
|
[PDF]
[Abstract]
|