ZL

Z. Li

info

Please Note

11 records found

Compiling LLMs to SQL for In-database Model Serving

Conference paper (2025) - Wenbo Sun, Ziyu Li, Rihan Hai
Deploying large language models (LLMs) often requires specialized hardware and complex frameworks, creating barriers for CPU-based environments with resource constraints. These systems, common in air-gapped or edge scenarios, lack support for maintenance due to security, budget, or technical limits. To address this, we introduce TranSQL+, a compiler that translates LLM inference into SQL queries, enabling deployment on relational databases. By converting transformer operations into relational algebra, TranSQL+ generates vector-oriented SQL queries that leverage native database features (buffer management, indexing) to manage computations without hardware accelerators or deep learning frameworks. Demonstrated with the LLaMA3.1 8B model on DuckDB, results show relational databases can effectively serve LLMs, reducing deployment barriers and expanding access to advanced AI. ...
Doctoral thesis (2024) - Z. Li, G.J.P.M. Houben, A. Bozzon, A. Katsifodimos
Over the last two decades, the machine learning (ML) field has witnessed a dramatic expansion, propelled by burgeoning data volumes and the advancement of computational technologies. Deep learning (DL) in particular has demonstrated remarkable success across a wide range of domains, including healthcare, mobility, life sciences, and energy systems. This success has been further accelerated by the availability and efficiency of open-source ML frameworks like TensorFlow and PyTorch, making ML methodologies more accessible than ever.

However, this rapid growth has brought its own set of challenges. The proliferation of ML models and related artifacts, such as datasets, have brought abundant information during the ML lifecycle. The descriptive and property information of these artifacts is referred as metadata. Yet current practices, such as model cards used in public model zoos and tools to track metadata within scripts, cannot fully captured the metadata of these artifacts, let alone a standardized approach for their management, and access. In addition, the prevailing practice of managing ML/DL scripts via traditional software repositories, while adequate for software engineering, falls short in addressing the unique needs of ML workflows, such as model reuse and comparative analysis. These practices hinder the effective use of structured and comprehensive metadata representation. This disconnect points to a pressing need for improved methodologies and tools in the ML field.

In response to these challenges, this thesis delves into the development and exploitation of structured metadata representations within ML model zoos. In Chapter 2, we first propose a metamodel that represent different types of metadata, thus transforming the metadata from being merely descriptive to being queryable and machine-readable. The structured nature of our metamodel allows for more efficient querying and retrieval of information, which is a substantial improvement over the traditional, text-based descriptions.

Additionally, the thesis explores the use of metadata to optimize various ML processes, particularly in the selection of appropriate models for specific tasks, i.e., model inference and fine-tuning. In Chapter 3, we investigate the optimization of ML inference queries in heterogeneous model zoos using a Mixed-Integer-Programming-based optimizer. This optimizer, which considers multiple objectives such as accuracy and inference speed, provides a robust framework for model selection and execution planning. In Chapter 4, the research extends to model selection for fine-tuning. We investigate on predicting model performance, particularly accuracy, in scenarios where data domains shift, thus negating the need for constant model fine-tuning. By selectively choosing only the most promising candidates, this method substantially lowers the computational burden and associated costs of extensive model fine-tuning.

Overall, this thesis investigates the representation and application of metadata. The insights and methodologies presented not only improve the efficiency and effectiveness of ML workflows but also pave the way for further exploration in the integration of metadata within ML practices, highlighting the continual development and potential for advancements in ML.
...

The Convergence of Data Integration and Machine Learning

Journal article (2024) - Ziyu Li, Wenbo Sun, Danning Zhan, Yan Kang, Lydia Chen, Alessandro Bozzon, Rihan Hai
Machine learning (ML) training data is often scattered across disparate collections of datasets, called <italic>data silos</italic>. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy constraints, data often cannot leave the premises of data silos; hence model training should proceed in a decentralized manner. In this work, we present a vision of bridging traditional data integration (DI) techniques with the requirements of modern machine learning systems. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness, efficiency, and privacy of ML models. Towards this direction, we analyze ML training and inference over data silos. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning, and federated learning. ...
Journal article (2023) - Ziyu Li, Henk Kant, Rihan Hai, Asterios Katsifodimos, Marco Brambilla, Alessandro Bozzon
Machine learning (ML) practitioners and organizations are building model repositories of pre-trained models, referred to as model zoos. These model zoos contain metadata describing the properties of the ML models and datasets. The metadata serves crucial roles for reporting, auditing, ensuring reproducibility, and enhancing interpretability. Despite the growing adoption of descriptive formats like datasheets and model cards, the metadata available in existing model zoos remains notably limited. Moreover, existing formats have limited expressiveness, thus constraining the potential use of model repositories, extending their purpose beyond mere storage for pre-trained models. This paper proposes a unified metadata representation format for model zoos. We illustrate that comprehensive metadata enables a diverse range of applications, encompassing model search, reuse, comparison, and composition of ML models. We also detail the design and highlight the implementation of an advanced model zoo system built on top of our proposed metadata representation. ...

Data Integration Meets Machine Learning

Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy and security constraints, data often cannot leave the premises of data silos, hence model training should proceed in a decentralized manner. In this work, we present a vision of how to bridge the traditional data integration (DI) techniques with the requirements of modern machine learning. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness and efficiency of ML models. Towards this direction, we analyze two common use cases over data silos, feature augmentation and federated learning. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning and federated learning. ...
Conference paper (2023) - Z. Li, R. Hai, A Katsifodimos, A. Bozzon
Machine learning (ML) researchers and practitioners are building repositories of pre-trained models, called model zoos. These model zoos contain metadata that detail various properties of the ML models and datasets, which are useful for reporting, auditing, reproducibility, and interpretability. Unfortunately, the existing metadata representations come with limited expressivity and lack of standardization. Meanwhile, an interoperable method to store and query model zoo metadata is missing. These two gaps hinder model search, reuse, comparison, and composition. In this demo paper, we advocate for standardized ML model metadata representation, proposing Macaroni, a metadata search engine with toolkits that support practitioners to obtain and enrich that metadata. ...
Conference paper (2023) - Z. Li, W. Sun, R. Hai, A. Bozzon, A Katsifodimos
The proliferation of pre-trained ML models in public Web-based model zoos facilitates the engineering of ML pipelines to address complex inference queries over datasets and streams of unstructured content. Constructing optimal plan for a query is hard, especially when constraints (e.g. accuracy or execution time) must be taken into consideration, and the complexity of the inference query increases. To address this issue, we propose a method for optimizing ML inference queries that selects the most suitable ML models to use, as well as the order in which those models are executed. We formally define the constraint-based ML inference query optimization problem, formulate it as a Mixed Integer Programming (MIP) problem, and develop an optimizer that maximizes accuracy given constraints. This optimizer is capable of navigating a large search space to identify optimal query plans on various model zoos. ...
Given a set of pre-trained Machine Learning (ML) models, can we solve complex analytic tasks that make use of those models by formulating ML inference queries? Can we mitigate different tradeoffs, e.g., high accuracy, low execution costs and memory footprint, when optimizing the queries? In this work we present different multi-objective ML inference query optimization strategies, and compare them on their usability, applicability, and complexity. We formulate Mixed-Integer-Programming-based (MIP) optimizers for ML inference queries that makes use of different objectives to find Pareto-optimal inference query plans. ...
Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the ML models and datasets that are useful for reporting, auditing, reproducibility, and interpretability purposes. The metatada is currently not standardised; its expressivity is limited; and there is no interoperable way to store and query it. Consequently, model search, reuse, comparison, and composition are hindered. In this paper, we advocate for standardized ML model metadata representation and management, proposing a toolkit supported to help practitioners manage and query that metadata. ...
Journal article (2020) - Xiu Xiu Zhan, Ziyu Li, Naoki Masuda, Petter Holme, Huijuan Wang
Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low-dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be connected. Classic network embedding algorithms are random-walk-based. They sample trajectory paths via random walks and generate node pairs from the trajectory paths. The node pair set is further used as the input for a Skip-Gram model, a representative language model that embeds nodes (which are regarded as words) into vectors. In the present study, we propose to replace random walk processes by a spreading process, namely the susceptible-infected (SI) model, to sample paths. Specifically, we propose two susceptible-infected-spreading-based algorithms, i.e., Susceptible-Infected Network Embedding (SINE) on static networks and Temporal Susceptible-Infected Network Embedding (TSINE) on temporal networks. The performance of our algorithms is evaluated by the missing link prediction task in comparison with state-of-the-art static and temporal network embedding algorithms. Results show that SINE and TSINE outperform the baselines across all six empirical datasets. We further find that the performance of SINE is mostly better than TSINE, suggesting that temporal information does not necessarily improve the embedding for missing link prediction. Moreover, we study the effect of the sampling size, quantified as the total length of the trajectory paths, on the performance of the embedding algorithms. The better performance of SINE and TSINE requires a smaller sampling size in comparison with the baseline algorithms. Hence, SI-spreading-based embedding tends to be more applicable to large-scale networks. ...
Cameras are ubiquitous nowadays and video analytic systems have been widely used in surveillance, traffic control, business intelligence and autonomous driving. Some applications, e.g., detecting road congestion in traffic monitoring, require continuous and timely reporting of complex patterns. However, conventional complex event processing (CEP) systems fail to support video processing, while the existing video query languages offer limited support for expressing advanced CEP queries, such as iteration, and window. In this PhD research, we aim to develop systems and methods to alleviate these issues. In this paper, we first identify the need for an expressive CEP language which allows users to define queries over video streams, and receive fast, accurate results. To evaluate CEP queries on videos in real-time and with high accuracy, we explain how a streaming query engine can be designed to provide native support of machine learning (ML) models for fast and accurate inference on video streams. In addition, we describe a set of optimization problems that arise when ML models, with trade-offs in speed, accuracy, and cost, are part of a query plan. Finally, we describe how query plans on real-time videos can be optimized and deployed on edge devices with limited computational and network capabilities. ...