A framework for the study of grid inter-operation mechanisms

More Info
expand_more

Abstract

The study of the history of computing infrastructures reveals an integration trend. For example, the explosive growth of the Internet in the 1990s was the result of an integration process started in the 1960s with the emerging networks of computers. By using the Internet, millions of users were capable of accessing information anytime and anywhere, much like other daily utilities such as water, electricity, and telephone. However, an important category of users remained under-served: the users with large computational and storage requirements, e.g., the scientists, the companies that focus on data analysis, and the governmental departments that manage the interaction between the state and the population (such as census, tax, and public health). Thus, in the mid-1990s, the vision of the (Computing) Grid as a universal computing utility was formulated. The main benefits promised by the Grid are similar to those of other integration efforts: extended and optimized service of the integrated network, and significant reductions of maintenance and operation costs through sharing and better scheduling. While the universal Grid has yet to be developed, large-scale distributed computing infrastructures that provide their users with seamless and secure access to computing resources, individually called Grid parts or simply grids, have been built throughout the world---in different countries, for different sciences, and both for production work and for computer-science research. At the same time, the main technological alternatives to grids, that is, supercomputers and large clusters, have evolved into much larger, scalable, and reliable systems. Thus, the integration of existing grids into larger infrastructures and finally into The Grid is key in keeping the grid vision attractive for its potential users. The integration of grids raises a double challenge, one related with the efficient scaling of a distributed computing system, the second associated with the operation of a system across different ownership and administrative domains. Thus, many of the traditional approaches for inter-operating computer systems, such as those based on completely centralized or purely decentralized system architectures, are eliminated from the start. To mark the distinction between the typical problem of integrating smaller components into a larger system and the double challenge of grid integration, we call the latter the problem of grid inter-operation. In this thesis we approach the problem of grid inter-operation with two main objectives: to design a comprehensive framework for the study of grid inter-operation mechanisms, and to provide an initial but good solution for this problem. We design a framework for the study of grid inter-operation that includes a toolbox for grid inter-operation research and a method for the study of grid inter-operation mechanisms. In the research toolbox we include the Grid Workloads Archive (GWA), a comprehensive model for grid resources and workloads, the GrenchMark performance evaluation framework, and the Delft Grid Simulation (DGSim) framework for repeated and realistic simulations of multi-cluster and multi-grid environments. The GWA and our comprehensive model show that grid computing is mostly used in practice for single-processor jobs and not for parallel computing, which raises previously ignored challenges related to the volume of jobs to be managed. We also devise in this thesis a method for studying grid inter-operation mechanisms. We answer using our framework important questions regarding existing grid operation mechanisms, and in particular show that these mechanisms are too limited to cope with real and realistic conditions. We further demonstrate the usefulness of our framework by designing Delegated MatchMaking, a novel mechanism for inter-operating grids. This mechanism is used to operate an architecture that is a hybrid between hierarchical and purely decentralized architectures. The Delegated Matchmaking mechanism attempts to use the local resources of a grid as much as possible and also transparently extends the local environment with resources obtained (delegated) from other sites when resources are not available locally. Our approach is compared with five alternatives through trace-based simulations, and is found to deliver the best performance, especially when the system is heavily loaded. While many other mechanisms can be designed in the future, our experiments prove that the Delegated MatchMaking approach already is a good solution for the problem of grid inter-operation. Our experiments also demonstrate that having grids inter-operate leads to better performance than having the same grids operate independently.