ZA

Z. Al-Ars

117 records found

GSST

Parallel string decompression at 191 GB/s on GPU

Most of the commonly used compression standards make use of some form of the LZ algorithm. Decompressing this type of data is not a good match for the Single-Instruction, Multiple Thread (SIMT) model of computation used by GPUs, resulting in low throughput and poor utilization of ...

GSST

Parallel string decompression at 191 GB/s on GPU

Most of the commonly used compression standards make use of some form of the LZ algorithm. Decompressing this type of data is not a good match for the Single-Instruction, Multiple Thread (SIMT) model of computation used by GPUs, resulting in low throughput and poor utilization of ...
This paper introduces SENMap, a mapping and synthesis tool for a scalable energy efficient neuromorphic computing architecture frameworks. SENECA a flexible architectural design optimized for executing edge AI SNN/ANN inference applications efficiently. To speed up the silicon ta ...
Synthetic image generation involves the creation of artificially generated images that are indistinguishable from real ones. Conventional simulation-based image synthesis approaches suffer from intensive computational and memory throughput demands associated with physically accur ...
While modern HDLs such as Chisel (Constructing Hardware In a Scala Embedded Language) significantly improve the process of design entry, debugging these designs is often problematic, because the tools that aid debugging operate on translated code rather than the original HDL. Fur ...
In this paper, we present a fully pipelined and semi-parallel channel convolutional neural network hardware accelerator structure. This structure can trade off the compute time and the hardware utilization, allowing the accelerator to be layer pipelined without the need for fully ...

Hardware-Accelerator Design by Composition

Dataflow Component Interfaces with Tydi-Chisel

As dedicated hardware is becoming more prevalent in accelerating complex applications, methods are needed to enable easy integration of multiple hardware components into a single accelerator system. However, this vision of composable hardware is hindered by the lack of standards ...
High accuracy nanopore basecalling uses large deep neural networks, requiring powerful GPUs, which is undesirable for sequencing experiments outside the lab. Research has shown that this can be circumvented by using smaller models to increase efficiency as well as basecalling spe ...

SENSIM

An Event-driven Parallel Simulator for Multi-core Neuromorphic Systems

In this paper, we present SENSIM, which is an open-source simulator designed specifically for the SENECA neuromorphic processor. This simulator is unique in that it combines features from both hardware-specific and hardware-agnostic spiking neural network simulators, resulting in ...
This paper introduces TINA, a novel framework for implementing non Neural Network (NN) signal processing algorithms on NN accelerators such as GPUs, TPUs or FPGAs. The key to this approach is the concept of mapping mathematical and logic functions as a series of convolutional and ...

Beyond quantum Shannon decomposition

Circuit construction for n -qubit gates based on block- ZXZ decomposition

This paper proposes an optimized quantum block-ZXZ decomposition method that results in more optimal quantum circuits than the quantum Shannon decomposition, which was presented in 2005 by M. Möttönen, and J. J. Vartiainen [in Trends in quantum computing research, edited by S. Sh ...
Background

Non-Invasive Prenatal Testing is often performed by utilizing read coverage-based profiles obtained from shallow whole genome sequencing to detect fetal copy number variations. Such screening typically operates on a discretized binned representation of the gen ...
This paper offers a systematic method for creating medical knowledge-grounded patient records for use in activities involving differential diagnosis. Additionally, an assessment of machine learning models that can differentiate between various conditions based on given symptoms i ...
Motion prediction is a key factor towards the full deployment of autonomous vehicles. It is fundamental in order to assure safety while navigating through highly interactive complex scenarios. In this work, the framework IAMP (Interaction-Aware Motion Prediction), producing multi ...

QKSA

Quantum Knowledge Seeking Agent

In this research, we extend the universal reinforcement learning agent models of artificial general intelligence to quantum environments. The utility function of a classical exploratory stochastic Knowledge Seeking Agent, KL-KSA, is generalized to distance measures from quantum i ...
Many researchers have proposed replacing the aggregation server in federated learning with a blockchain system to improve privacy, robustness, and scalability. In this approach, clients would upload their updated models to the blockchain ledger and use a smart contract to perform ...
Tydi is an open specification for streaming dataflow designs in digital circuits, allowing designers to express how composite and variable-length data structures are transferred over streams using clear, data-centric types. These data types are extensively used in a many applicat ...
In spite of progress on hardware design languages, the design of high-performance hardware accelerators forces many design decisions specializing the interfaces of these accelerators in ways that complicate the understanding of the design and hinder modularity and collaboration. ...
Convolutional neural networks (CNNs) are to be effective in many application domains, especially in the computer vision area. In order to achieve lower latency CNN processing, and reduce power consumption, developers are experimenting with using FPGAs to accelerate CNN processing ...