TA

7 records found

Current cluster scaled genomics data processing solutions rely on big data frameworks like Apache Spark, Hadoop and HDFS for data scheduling, processing and storage. These frameworks come with additional computation and memory overheads by default. It has been observed that scali ...

SALoBa

Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs

Sequence alignment forms an important backbone in many sequencing applications. A commonly used strategy for sequence alignment is an approximate string matching with a two-dimensional dynamic programming approach. Although some prior work has been conducted on GPU acceleration o ...
Moving structured data between different big data frameworks and/or data warehouses/storage systems often cause significant overhead. Most of the time more than 80% of the total time spent in accessing data is elapsed in serialization/de-serialization step. Columnar data formats ...
The ever increasing pace of advancements in sequencing technologies has enabled rapid DNA/genome sequencing to become much more accessible. In particular, next (second) and third generation sequencing technologies offer high throughput, massively parallel and cost effective seque ...

VC@Scale

Scalable and high-performance variant calling on cluster environments

Background Recently many new deep learning–based variant-calling methods like DeepVariant have emerged as more accurate compared with conventional variant-calling algorithms such as GATK HaplotypeCaller, Sterlka2, and Freebayes albeit at higher computational costs. Therefore, the ...
Background: Immense improvements in sequencing technologies enable producing large amounts of high throughput and cost effective next-generation sequencing (NGS) data. This data needs to be processed efficiently for further downstream analyses. Computing systems need this large a ...

ArrowSAM

In-Memory Genomics Data Processing Using Apache Arrow

The rapidly growing size of genomics data bases, driven by advances in sequencing technologies, demands fast and cost-effective processing. However, processing this data creates many challenges, particularly in selecting appropriate algorithms and computing platforms. Computing s ...