ZygosDB: An efficient read-only database for Genome-Wide Association Studies (GWAS)

Bachelor Thesis (2024)
Author(s)

N. van Luijk (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Tesi – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

MJT Reinders – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
23-06-2024
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper describes ZygosDB, a novel and efficient read-only database designed specifically for querying positional genomic data required for Genome-Wide Association Studies (GWAS). ZygosDB addresses limitations of existing solutions like Tabix by offering optimized data structures, compression techniques, and parallel query execution.

Our evaluation shows that ZygosDB achieves a significant speedup of 2-5 times in query throughput compared to Tabix. This improvement comes from our focus on efficient data storage and retrieval tailored to the specific needs of GWAS.

The paper also explores the impact of multi-threading on query performance and the role of compression algorithms in optimizing query throughput. We identify a decrease in performance beyond a certain number of threads and discuss the influence of compression algorithms like Gzip and LZ4.

While ZygosDB offers substantial performance gains, future work should explore avenues for further optimization, such as measuring query latency, refining memory usage, and investigating specialized column support. Overall, ZygosDB establishes itself as a powerful tool for efficient querying of large genomic datasets, facilitating more effective GWAS.

Files

Research_paper_14_.pdf
(pdf | 0.407 Mb)
License info not available