T.G.R.M. van Leuken
Please Note
31 records found
1
Jumping Shift
A Logarithmic Quantization Method for Low-Power CNN Acceleration
DSP blocks are one of the efficient solutions to implement multiply-accumulate (MAC) operations on FPGAs. However, since the DSP blocks have wide multiplier and adder blocks, MAC operations using low bit-length parameters lead to an underutilization. Hence, an efficient approximation technique is introduced. The technique includes manipulation and approximation of the low bit-length parameters based upon a Single DSP - Multiple Multiplication (SDMM) execution. The accuracy of the developed optimization technique was evaluated for different CNN weight bit precisions using the Alexnet and VGG-16 networks and the ImageNet ILSVRC-2012 dataset. The optimization can be implemented without loss of accuracy in almost all cases, while it causes slight accuracy losses in a few cases. Through these optimizations, multiple parameter multiplications are performed in a single DSP block at the cost of a small hardware overhead. As a result of our optimizations, the parameters are represented in a different format on off-chip memory, providing up to 33% compression without any hardware cost. A prototype systolic array architecture was implemented employing our optimizations on a Xilinx Zynq FPGA. It reduced the number of DSP blocks by 66.6%, 75%, and 83.3% for 8, 6, and 4-bit input variables, respectively.
Computation capability characteristics of neuromorphic analog/mixed-signal spiking neural networks offer capable platform for implementation of cognitive tasks on resource-limited embedded platforms. In this paper, we derive stochastic model of spiking neural processing systems for energy-efficient recognition and inference of biomedical systems. We examine imperfections in the network dynamics and noise-induced information processing, influence of the uncertainty on the behavior of the emulated networks, and impact on the clustering accuracy of cardiac arrhythmia. Experimental results indicate that stochasticity at networks connections is a adequate resource for deep learning machines.
In pulse-based neural networks, synaptic dynamics can have direct influence on learning of neural codes, and encoding of spatiotemporal spike patterns. In this paper, we propose an adaptive synapse circuit for increased flexibility and efficacy of signal processing units in neuromorphic structures. The synapse acts as a multi-layer computational network, and includes multi-compartment dendrites and different types of post-synaptic back propagating signals. With built-in temporal control mechanisms, the resulting reconfigurable network allows the implementation of synaptic homeostatics.
Advanced driving assistance systems (ADAS) prepave regulators, consumers and corporations for the medium-term reality of autonomous driving with adaptive cruise control, collision avoidance and lane departure warning system. Various sensors like camera, RADAR and LIDAR, integrated into the vehicle assist driving. In addition, deep learning approaches are utilized in a wide range of applications ranging from object detection and scene segmentation to engine fault diagnosis and emission management to detect vehicle network intrusion. In this paper, we scope out the state of the art sensors subsystems in terms of its functionality, characteristics, specifications and communication protocol, and we describe cognitive deep learning based algorithms required for environment perception through these sensors. Subsequently, we analyze the cognitive algorithm by profiling the standard deep learning models, explore different compute platforms and possible algorithm and hardware optimization scenarios.
Synaptic dynamics is of great importance in realizing biophysically accurate neural behaviors and efficient synaptic learning in neuromorphic integrated circuits. In this paper, we propose a current-based synapse structure with multi-compartment receptors AMPA, NMDA and GABAa and a weight-dependent learning algorithm. The designed circuit offers distinctive dynamic features of receptors as well as a joint synaptic function. A cross-correlation methodology is applied to a two-layer RNN built by multi-compartment receptors to demonstrate the proposed synapse structure. An increased computation efficiency is verified through temporal synchrony detection among the neural layers in a noisy environment. The design implemented in TSMC 65 nm CMOS technology consumes 1.92, 3.36, 1.11 and 35.22 pJ per spike event of energy for AMPA, NMDA, GABAa and the advanced learning circuit, respectively.
Simulating large spiking neural networks with a high level of realism in a FPGA requires efficient network architectures that satisfy both the resource and interconnect constraints, as well as the changes in traffic patterns due to learning processes. In this paper, we propose a dataflow architecture based on a multipath ring topology that offers traffic shaping capabilities, and high energy-efficiency for the neuron-to-neuron communications.
The pathophysiological processes underlying the ECG tracing demonstrate significant heart rate and the morphological pattern variations, for different or in the same patient at diverse physical/temporal conditions. Within this framework, spiking neural networks (SNN) may be a compelling approach to ECG pattern classification based on the individual characteristics of each patient. In this paper, we study electrophysiological dynamics in the self-organizing map SNN when the coefficients of the neuronal connectivity matrix are random variables. We examine synchronicity and noise-induced information processing, influence of the uncertainty on the system signal-to-noise ratio, and impact on the clustering accuracy of cardiac arrhythmia.
Simulation of brain neurons in real-time using biophysically meaningful models is a prerequisite for comprehensive understanding of how neurons process information and communicate with each other, in effect efficiently complementing in-vivo experiments. State-of-the-art neuron simulators are, however, capable of simulating at most few tens/hundreds of biophysically accurate neurons in real-time due to the exponential growth in the interneuron communication costs with the number of simulated neurons. In this paper, we propose a real-time, reconfigurable, multichip system architecture based on localized communication, which effectively reduces the communication cost to a linear growth. All parts of the system are generated automatically, based on the neuron connectivity scheme. Experimental results indicate that the proposed system architecture allows the capacity of over 3000 to 19 200 (depending on the connectivity scheme) biophysically accurate neurons over multiple chips.
Fighting Dark Silicon
Toward Realizing Efficient Thermal-Aware 3-D Stacked Multiprocessors