Search results | TU Delft Repositories

Searched for: subject%3A%22big%255C%252Bdata%22

(1 - 3 of 3)

document: Communication-Efficient Cluster Scalable Genomics Data Processing Using Apache Arrow Flight
Ahmad, T. (author), Ma, Chengxin (author), Al-Ars, Z. (author), Hofstee, H.P. (author)
Current cluster scaled genomics data processing solutions rely on big data frameworks like Apache Spark, Hadoop and HDFS for data scheduling, processing and storage. These frameworks come with additional computation and memory overheads by default. It has been observed that scaling genomics dataset processing beyond 32 nodes is not efficient on...
conference paper 2022

document: Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework
Ahmad, T. (author), Ahmed, N. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)
Background: Immense improvements in sequencing technologies enable producing large amounts of high throughput and cost effective next-generation sequencing (NGS) data. This data needs to be processed efficiently for further downstream analyses. Computing systems need this large amounts of data closer to the processor (with low latency) for...
journal article 2020

document: ArrowSAM: In-Memory Genomics Data Processing Using Apache Arrow
Ahmad, T. (author), Ahmed, N. (author), Peltenburg, J.W. (author), Al-Ars, Z. (author)
The rapidly growing size of genomics data bases, driven by advances in sequencing technologies, demands fast and cost-effective processing. However, processing this data creates many challenges, particularly in selecting appropriate algorithms and computing platforms. Computing systems need data closer to the processor for fast processing....
conference paper 2020