Graph processing on systems with disaggregated memory
Aiding financial crime detection in large datasets
K. Khalili (TU Delft - Electrical Engineering, Mathematics and Computer Science)
H.P. Hofstee – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Kubilay Atasu – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Z. Al-Ars – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
With the rise of memory costs and the persistent under-utilization of memory in clusters, researchers have begun exploring alternative approaches to improve memory efficiency and reduce operational costs. Resource disaggregation is becoming increasingly common and sought after, driven by the emergence of new interconnect standards such as CXL and, previously, OpenCAPI. While the industry is primarily moving toward memory pooling, where memory is dynamically provisioned among applications or virtual machines, this work investigates distributed memory disaggregation and sharing. IBM's Power10 processors include hardware support that enables multiple systems to directly share memory. However, few applications have been developed to take advantage of disaggregated shared memory.
Since Memory Inception, Power10 processors' memory disaggregation hardware, is not yet fully operational, a ThymesisFlow prototype, upgraded to support a shared disaggregated memory system with the help of Apache Arrow, is used to implement a practical application. The selected application is a graph processor capable of detecting money laundering patterns in financial transaction graphs in real-time. These patterns yield transaction features that machine learning algorithms can use to identify fraudulent financial transactions.
Our proof-of-concept implementation enables the creation of a distributed graph, represented as Apache Arrow tables, that can process large datasets in real-time. The graph resides in a shared disaggregated memory region and can be accessed by multiple systems without data copying, incurring lower latency penalties than network-based data retrieval. The distributed graph processor was developed and tested using the ThymesisFlow prototype provided by the Hasso Plattner Institute.