Adding fault tolerance to OpenCL

Through redundant heterogeneous computing

Master Thesis (2023)
Author(s)

R.A. Bijl (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z. Al-Ars – Mentor (TU Delft - Computer Engineering)

C. Lofi – Graduation committee member (TU Delft - Web Information Systems)

Pekka Jaäskelaïnen – Coach (Tampere University)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Robin Bijl
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Robin Bijl
Graduation Date
30-06-2023
Awarding Institution
Delft University of Technology
Programme
['Computer Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The ever-increasing demand for computing has led to the need for specialized heterogeneous hardware, and the frameworks required to utilize them. Besides the traditional central processing units, more and more programs will make use of specialized hardware to accelerate computations. However, the increase in computing also leads to shorter mean time between failures. In this thesis, we apply fault tolerance to Portable Computing Language (PoCL), an open-source implementation of the OpenCL standard. We show that our solution is easy to apply to existing programs making use of PoCL/OpenCL and is able to greatly reduce the total number of errors visible to the end user. Our solution can be used on any device supported by PoCL and provides a low overhead, given that the hardware requirements are met.

Files

Robin_Bijl_MSC_THESIS.pdf
(pdf | 0.868 Mb)
License info not available