Implementation and evaluation of Ordo
A high performance data processing system
M. Melas (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Jan Rellermeyer – Mentor (TU Delft - Data-Intensive Systems)
Lydia Y. Chen – Graduation committee member (TU Delft - Data-Intensive Systems)
A. Katsifodimos – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Data processing systems have become increasingly important in modern computing, as the volume and complexity of data that needs to be analyzed has grown dramatically. Multiple data processing systems have been and are being developed, that are scalable, resilient and performant.
However, despite the advances made in data processing technology, there are still challenges that need to be addressed in order to optimize the performance, energy efficiency as well as the practical- ity of these systems. One such challenge is the need to effectively manage the underlying system’s resources, including the system’s throughput and the amount of work that each operator has to do and to use optimal data-structures that would lead in faster task processing speeds.
To address this challenge, this thesis proposes the implementation of a high-performance data processing system that exposes the underlying system’s metrics to the application level and applys an innovative way for operator communication, by utilizing an efficient thread-safe data structure. By providing underlying system’s metrics to the application’s scheduler, the scheduler can schedule the tasks optimally according to the current system’s state and adjust the system’s resources during run- time. This alleviates the developers from having to fine-tune the system beforehand and allows the system to tackle fluctuating input workload more efficiently.
This thesis will explore the design and implementation of such system, as well as its impact on the performance, energy-efficiency and resiliency of data processing applications. We provide perfor- mance measurements as well as a qualitative comparison of our system compared to other state-of-the art systems, proving our hypotheses.