Accelerating machine learning queries with linear algebra query processing

None, None; None, None; None, None

Accelerating machine learning queries with linear algebra query processing

Journal Article (2025)

Author(s)

W. Sun (TU Delft - Web Information Systems)

A Katsifodimos (TU Delft - Data-Intensive Systems)

R. Hai (TU Delft - Web Information Systems)

Research Group

Web Information Systems

DOI related publication

https://doi.org/10.1007/s10619-024-07451-7

Machine learning Database Query optimization Operator fusion

To reference this document use:

https://resolver.tudelft.nl/uuid:e3bf7298-3377-4453-b3f0-035c279364a3

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Web Information Systems

Issue number

1

Volume number

43

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, data processing, and model predictions often operate in separate execution environments, leading to redundant engineering and computations. Additionally, the diverging mathematical foundations of data processing and machine learning hinder cross-optimizations by combining these two components, thereby overlooking potential opportunities to expedite predictive pipelines. In this paper, we propose an operator fusion method based on GPU-accelerated linear algebraic evaluation of relational queries. Our method leverages linear algebra computation properties to merge operators in machine learning predictions and data processing, significantly accelerating predictive pipelines by up to 317x. We perform a complexity analysis to deliver quantitative insights into the advantages of operator fusion, considering various data and model dimensions. Furthermore, we extensively evaluate linear algebra query processing and operator fusion utilizing the widely-used Star Schema and TPC-DI benchmarks. Through comprehensive evaluations, we demonstrate the effectiveness and potential of our approach in improving the efficiency of data processing and machine learning workloads on modern hardware.

Files

S10619-024-07451-7.pdf

(pdf | 2.39 Mb)