R. Hai | TU Delft Repository

Database is All You Need

Serving LLMs with Relational Queries

Conference paper (2025) - W. Sun (author) , Z. Li (author) , Vaishnav Srinidhi (author) , R. Hai (author)

Large language models (LLMs) have become central to many applications, but their deployment often requires high-performance hardware, specialized libraries, and complex engineering, limiting accessibility for smaller organizations. Meanwhile, relational database systems (RDBMS) a ...

Database as Runtime

Compiling LLMs to SQL for In-database Model Serving

Conference paper (2025) - W. Sun (author) , Z. Li (author) , R. Hai (author)

Deploying large language models (LLMs) often requires specialized hardware and complex frameworks, creating barriers for CPU-based environments with resource constraints. These systems, common in air-gapped or edge scenarios, lack support for maintenance due to security, budget, ...

Qymera

Simulating Quantum Circuits using RDBMS

Conference paper (2025) - Tim Littau (author) , R. Hai (author)

Quantum circuit simulation is crucial for quantum computing such as validating quantum algorithms. We present Qymera, a system that repurposes relational database management systems (RDBMSs) for simulation by translating circuits into SQL queries, allowing quantum operations to r ...

Accelerating machine learning queries with linear algebra query processing

Journal article (2025) - Wenbo Sun (author) , Asterios Katsifodimos (author) , R. Hai (author)

The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, data processing, and model pr ...

Model Selection with Model Zoo via Graph Learning

Conference paper (2024) - Z. Li (author) , H.J. Van Der Wilk (author) , D. Zhan (author) , M. Khosla (author) , A. Bozzon (author) , R. Hai (author)

Pre-trained deep learning (DL) models are increasingly accessible in public repositories, i.e., model zoos. Given a new prediction task, finding the best model to fine-tune can be computationally intensive and costly, especially when the number of pre-trained models is large. Sel ...

LLM-PQA

LLM-enhanced Prediction Query Answering

Conference paper (2024) - Z. Li (author) , Wenjie Zhao (author) , Asterios Katsifodimos (author) , R. Hai (author)

The advent of Large Language Models (LLMs) provides an opportunity to change the way queries are processed, moving beyond the constraints of conventional SQL-based database systems. However, using an LLM to answer a prediction query is still challenging, since an external ML mode ...

SiloFuse

Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models

Conference paper (2024) - Aditya Shankar (author) , J.C. Brouwer (author) , R. Hai (author) , Y. Chen (author)

Synthetic tabular data is crucial for sharing and augmenting data across silos, especially for enterprises with proprietary data. However, existing synthesizers are designed for centrally stored data. Hence, they struggle with real-world scenarios where features are distributed a ...

Quantum Data Management

From Theory to Opportunities

Conference paper (2024) - Rihan Hai (author) , Shih-Te Hung (author) , Sebastian Feld (author)

Quantum computing has emerged as a transformative tool for future data management. Classical problems in database domains, including query optimization, data integration, and transaction management, have recently been addressed using quantum computing techniques. This tutorial ai ...

Human-in-the-Loop Feature Discovery for Tabular Data

Conference paper (2024) - A. Ionescu (author) , Zeger Mouw (author) , E.A. Aivaloglou (author) , Rihan Hai (author) , Asterios Katsifodimos (author)

In recent years, researchers have developed several methods to automate discovering datasets and augmenting features for training Machine Learning (ML) models. Together with feature selection, these efforts have paved the way towards what is termed the feature discovery process. ...

Amalur

The Convergence of Data Integration and Machine Learning

Journal article (2024) - Z. Li (author) , W. Sun (author) , Danning Zhan (author) , Yan Kang (author) , Y. Chen (author) , Alessandro Bozzon (author) , R. Hai (author)

Machine learning (ML) training data is often scattered across disparate collections of datasets, called <italic>data silos</italic>. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different ...

Topio Marketplace: Search and Discovery of Geospatial Data

Conference paper (2023) - A. Ionescu (author) , Alexandra Alexandridou (author) , K. Psarakis (author) , Kostas Patroumpas (author) , Georgios Chatzigeorgakidis (author) , Dimitrios Skoutas (author) , Spiros Athanasiou (author) , R. Hai (author) , Asterios Katsifodimos (author)

The increasing need for data trading has created a high demand for data marketplaces. These marketplaces require a set of valueadded services, such as advanced search and discovery, that have been proposed in the database research community for years, but are yet to be put to pra ...

Macaroni: Crawling and Enriching Metadata from Public Model Zoos

Conference paper (2023) - Ziyu Li (author) , Rihan Hai (author) , A. Katsifodimos (author) , A Bozzon (author)

Machine learning (ML) researchers and practitioners are building repositories of pre-trained models, called model zoos. These model zoos contain metadata that detail various properties of the ML models and datasets, which are useful for reporting, auditing, reproducibility, and i ...

Amalur

Data Integration Meets Machine Learning

Conference paper (2023) - R. Hai (author) , Christos Koutras (author) , A. Ionescu (author) , Ziyu Li (author) , Wenbo Sun (author) , Jessie van Schijndel (author) , Yan Kang (author) , A. Katsifodimos (author)

Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manua ...

Optimizing Machine Learning Inference Queries for Multiple Objectives

Conference paper (2023) - Z. Li (author) , Mariette Schonfeld (author) , R. Hai (author) , A Bozzon (author) , A Katsifodimos (author)

Given a set of pre-trained Machine Learning (ML) models, can we solve complex analytic tasks that make use of those models by formulating ML inference queries? Can we mitigate different tradeoffs, e.g., high accuracy, low execution costs and memory footprint, when optimizing the ...

Metadata Representations for Queryable Repositories of Machine Learning Models

Journal article (2023) - Z. Li (author) , Henk Kant (author) , R. Hai (author) , A Katsifodimos (author) , Marco Brambilla (author) , Alessandro Bozzon (author)

Machine learning (ML) practitioners and organizations are building model repositories of pre-trained models, referred to as model zoos. These model zoos contain metadata describing the properties of the ML models and datasets. The metadata serves crucial roles for reporting, audi ...

Optimizing ML Inference Queries Under Constraints

Conference paper (2023) - Ziyu Li (author) , W. Sun (author) , Rihan Hai (author) , Alessandro Bozzon (author) , A. Katsifodimos (author)

The proliferation of pre-trained ML models in public Web-based model zoos facilitates the engineering of ML pipelines to address complex inference queries over datasets and streams of unstructured content. Constructing optimal plan for a query is hard, especially when constraints ...

Accelerating Machine Learning Queries with Linear Algebra Query Processing

Conference paper (2023) - Wenbo Sun (author) , Asterios Katsifodimos (author) , R. Hai (author)

The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, data processing, and model pr ...

Data Lakes

A Survey of Functions and Systems

Journal article (2023) - R. Hai (author) , C. Koutras (author) , Christoph Quix (author) , Matthias Jarke (author)

Data lakes are becoming increasingly prevalent for Big Data management and data analytics. In contrast to traditional 'schema-on-write' approaches such as data warehouses, data lakes are repositories storing raw data in its original formats and providing a common access interface ...

An Empirical Performance Comparison between Matrix Multiplication Join and Hash Join on GPUs

Conference paper (2023) - Wenbo Sun (author) , Asterios Katsifodimos (author) , R. Hai (author)

Recent advances in Graphic Processing Units (GPUs) have facilitated a significant performance boost for database operators, in particular, joins. It has been intensively studied how conventional join implementations, such as hash joins, benefit from the massive parallelism of GPU ...

Metadata Representations for Queryable ML Model Zoos

Conference paper (2022) - Ziyu Li (author) , Rihan Hai (author) , A Bozzon (author) , A. Katsifodimos (author)

Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the ML models and datasets that are useful for reporting, auditing, reproducibility, and interpretability purposes. The metatada is cu ...