ZygosDB: An efficient read-only database for Genome-Wide Association Studies (GWAS)
More Info
expand_more
Abstract
This paper describes ZygosDB, a novel and efficient read-only database designed specifically for querying positional genomic data required for Genome-Wide Association Studies (GWAS). ZygosDB addresses limitations of existing solutions like Tabix by offering optimized data structures, compression techniques, and parallel query execution.
Our evaluation shows that ZygosDB achieves a significant speedup of 2-5 times in query throughput compared to Tabix. This improvement comes from our focus on efficient data storage and retrieval tailored to the specific needs of GWAS.
The paper also explores the impact of multi-threading on query performance and the role of compression algorithms in optimizing query throughput. We identify a decrease in performance beyond a certain number of threads and discuss the influence of compression algorithms like Gzip and LZ4.
While ZygosDB offers substantial performance gains, future work should explore avenues for further optimization, such as measuring query latency, refining memory usage, and investigating specialized column support. Overall, ZygosDB establishes itself as a powerful tool for efficient querying of large genomic datasets, facilitating more effective GWAS.