Performance modeling of PageRank in large-scale systems

A case study

Master Thesis (2020)
Author(s)

N.A. Doekemeijer (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Ana Lucia Varbanescu – Mentor (Universiteit van Amsterdam)

Henk Sips – Graduation committee member (TU Delft - Data-Intensive Systems)

Przemek Przemysław – Graduation committee member (TU Delft - Embedded Systems)

J. Hidders – Graduation committee member (Birkbeck, University of London)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2020 Niels Doekemeijer
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 Niels Doekemeijer
Graduation Date
26-08-2020
Awarding Institution
Delft University of Technology
Programme
['Computer Science']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Graphs are a ubiquitous concept used for modeling entities and their relationships. Large graphs, present in a variety of domains, are often fundamentally difficult to process because of sheer size and irregular computation structure. In recent years, both academia and industry have committed to designing scalable solutions to efficiently process these graphs.

Next to processing large datasets in a distributed environment, a relatively new trend is to accelerate single node computation performance using heterogeneous platforms (for example, by leveraging the GPU as well as the CPU). However, the structure of the input graph can markedly influence the processing speed on a certain platform and it is unclear what would be the most efficient platform for execution given an input dataset.

In this thesis, we will analyze the performance of multiple PageRank implementations for diverse platforms. Using implementations for CPU (using OMP), GPU (using OpenCL and CUDA), and heterogeneous environments (using StarPU and MPI), we will characterize platform performance in relation to the structure of the input dataset. Finally, we will propose and evaluate a performance model for PageRank that takes into account traits of the input graph.

Files

License info not available