Search results | TU Delft Repositories

document

FPQNet: Fully Pipelined and Quantized CNN for Ultra-Low Latency Image Classification on FPGAs Using OpenCAPI

Ji, M. (author), Al-Ars, Z. (author), Hofstee, H.P. (author), Chang, Yuchun (author), Zhang, Baolin (author)

Convolutional neural networks (CNNs) are to be effective in many application domains, especially in the computer vision area. In order to achieve lower latency CNN processing, and reduce power consumption, developers are experimenting with using FPGAs to accelerate CNN processing in several applications. Current FPGA CNN accelerators usually use...

journal article 2023

document

Tydi-Chisel: Collaborative and Interface-Driven Data-Streaming Accelerators

Cromjongh, Casper (author), Tian, Y. (author), Hofstee, H.P. (author), Al-Ars, Z. (author)

In spite of progress on hardware design languages, the design of high-performance hardware accelerators forces many design decisions specializing the interfaces of these accelerators in ways that complicate the understanding of the design and hinder modularity and collaboration. In response to this challenge, Tydi is presented as an open...

conference paper 2023

document

An Intermediate Representation for Composable Typed Streaming Dataflow Designs

Reukers, Matthijs A. (author), Tian, Y. (author), Al-Ars, Z. (author), Hofstee, H.P. (author), Brobbel, M. (author), Peltenburg, J.W. (author), van Straten, J. (author)

Tydi is an open specification for streaming dataflow designs in digital circuits, allowing designers to express how composite and variable-length data structures are transferred over streams using clear, data-centric types. These data types are extensively used in a many application domains, such as big data and SQL applications. This way,...

journal article 2023

document

Benchmarking Apache Arrow Flight - A wire-speed protocol for data transfer, querying and microservices

Ahmad, T. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

Moving structured data between different big data frameworks and/or data warehouses/storage systems often cause significant overhead. Most of the time more than 80% of the total time spent in accessing data is elapsed in serialization/de-serialization step. Columnar data formats are gaining popularity in both analytics and transactional...

conference paper 2022

document

Communication-Efficient Cluster Scalable Genomics Data Processing Using Apache Arrow Flight

Ahmad, T. (author), Ma, Chengxin (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

Current cluster scaled genomics data processing solutions rely on big data frameworks like Apache Spark, Hadoop and HDFS for data scheduling, processing and storage. These frameworks come with additional computation and memory overheads by default. It has been observed that scaling genomics dataset processing beyond 32 nodes is not efficient on...

conference paper 2022

document

SALoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs

Park, Seongyeon (author), Kim, Hajin (author), Ahmad, T. (author), Ahmed, N. (author), Al-Ars, Z. (author), Hofstee, H.P. (author), Kim, Youngsok (author), Lee, Jinho (author)

Sequence alignment forms an important backbone in many sequencing applications. A commonly used strategy for sequence alignment is an approximate string matching with a two-dimensional dynamic programming approach. Although some prior work has been conducted on GPU acceleration of a sequence alignment, we identify several shortcomings that limit...

conference paper 2022

document

Battling the CPU Bottleneck in Apache Parquet to Arrow Conversion Using FPGA

Peltenburg, J.W. (author), Van Leeuwen, Lars T.J. (author), Hoozemans, J.J. (author), Fang, J. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

In the domain of big data analytics, the bottleneck of converting storage-focused file formats to in-memory data structures has shifted from the bandwidth of storage to the performance of decoding and decompression software. Two widely used formats for big data storage and in-memory data are Apache Parquet and Apache Arrow, respectively. In...

conference paper 2021

document

An Attention Module for Convolutional Neural Networks

Zhu, B. (author), Hofstee, H.P. (author), Lee, Jinho (author), Al-Ars, Z. (author)

Attention mechanism has been regarded as an advanced technique to capture long-range feature interactions and to boost the representation capability for convolutional neural networks. However, we found two ignored problems in current attentional activations-based models: the approximation problem and the insufficient capacity problem of the...

conference paper 2021

document

VC@Scale: Scalable and high-performance variant calling on cluster environments

Ahmad, T. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

Background Recently many new deep learning–based variant-calling methods like DeepVariant have emerged as more accurate compared with conventional variant-calling algorithms such as GATK HaplotypeCaller, Sterlka2, and Freebayes albeit at higher computational costs. Therefore, there is a need for more scalable and higher performance workflows of...

review 2021

document

FPGA Acceleration for Big Data Analytics: Challenges and Opportunities

Hoozemans, J.J. (author), Peltenburg, J.W. (author), Nonnenmacher, Fabian (author), Hadnagy, A. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

The big data revolution has ushered an era with ever increasing volumes and complexity of data requiring ever faster computational analysis. During this very same era, CPU performance growth has been stagnating, pushing the industry to either scale their computation horizontally using multiple nodes in datacenters, or to scale vertically using...

journal article 2021

document

Generating high-performance FPGA accelerator designs for big data analytics with Fletcher and Apache Arrow

Peltenburg, J.W. (author), van Straten, J. (author), Brobbel, M. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

As big data analytics systems are squeezing out the last bits of performance of CPUs and GPUs, the next near-term and widely available alternative industry is considering for higher performance in the data center and cloud is the FPGA accelerator. We discuss several challenges a developer has to face when designing and integrating FPGA...

journal article 2021

document

NASB: Neural Architecture Search for Binary Convolutional Neural Networks

Zhu, B. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

Binary Convolutional Neural Networks (CNNs) have significantly reduced the number of arithmetic operations and the size of memory storage needed for CNNs, which makes their deployment on mobile and embedded systems more feasible. However, after binarization, the CNN architecture has to be redesigned and refined significantly due to two reasons:...

conference paper 2020

document

Tydi: an open specification for complex data structures over hardware streams

Peltenburg, J.W. (author), Brobbel, M. (author), van Straten, J. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

Streaming dataflow designs describe hardware by connecting components through streams that transport data structures. We introduce a stream-oriented specification and type system that provides a clear and intuitive way to map complex, dynamically-sized data structures onto hardware streams. This helps designers to lift the abstraction of...

journal article 2020

document

An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic

Fang, J. (author), Chen, Jianyu (author), Lee, Jinho (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

To best leverage high-bandwidth storage and network technologies requires an improvement in the speed at which we can decompress data. We present a “refine and recycle” method applicable to LZ77-type decompressors that enables efficient high-bandwidth designs and present an implementation in reconfigurable logic. The method refines the write...

journal article 2020

document

Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework

Ahmad, T. (author), Ahmed, N. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

Background: Immense improvements in sequencing technologies enable producing large amounts of high throughput and cost effective next-generation sequencing (NGS) data. This data needs to be processed efficiently for further downstream analyses. Computing systems need this large amounts of data closer to the processor (with low latency) for...

journal article 2020

document

ReAF: Reducing approximation of channels by reducing feature reuse within convolution

Zhu, B. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

High-level feature maps of Convolutional Neural Networks are computed by reusing their corresponding low-level feature maps, which brings into full play feature reuse to improve the computational efficiency. This form of feature reuse is referred to as feature reuse between convolutional layers. The second type of feature reuse is referred to...

journal article 2020

document

Fletcher: A framework to efficiently integrate FPGA accelerators with apache arrow

Peltenburg, J.W. (author), van Straten, J. (author), Wijtemans, L. (author), Van Leeuwen, Lars (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

Modern big data systems are highly heterogeneous. The components found in their many layers of abstraction are often implemented in a wide variety of programming languages and frameworks. Due to language implementation differences, interfaces between these components, including hardware accelerated components, are often burdened by...

conference paper 2019

document

An Accelerator for Posit Arithmetic Targeting Posit Level 1 BLAS Routines and Pair-HMM

van Dam, Laurens (author), Peltenburg, J.W. (author), Al-Ars, Z. (author), Hofstee, H.P. (author)

The newly proposed posit number format uses a significantly different approach to represent floating point numbers. This paper introduces a framework for posit arithmetic in reconfigurable logic that maintains full precision in intermediate results. We present the design and implementation of a L1 BLAS arithmetic accelerator on posit vectors...

conference paper 2019

document

Supporting Columnar In-memory Formats on FPGA: The Hardware Design of Fletcher for Apache Arrow

Peltenburg, J.W. (author), van Straten, J. (author), Brobbel, M. (author), Hofstee, H.P. (author), Al-Ars, Z. (author)

As a columnar in-memory format, Apache Arrow has seen increased interest from the data analytics community. Fletcher is a framework that generates hardware interfaces based on this format, to be used in FPGA accelerators. This allows efficient integration of FPGA accelerators with various high-level software languages, while providing an easy-to...

conference paper 2019

document

In-memory database acceleration on FPGAs: a survey

Fang, J. (author), Mulder, Yvo T.B. (author), Hidders, Jan (author), Lee, Jinho (author), Hofstee, H.P. (author)

While FPGAs have seen prior use in database systems, in recent years interest in using FPGA to accelerate databases has declined in both industry and academia for the following three reasons. First, specifically for in-memory databases, FPGAs integrated with conventional I/O provide insufficient bandwidth, limiting performance. Second, GPUs,...

journal article 2019

Pages

Pages