DA
D. Aledo Ortega
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Jumping Shift
A Logarithmic Quantization Method for Low-Power CNN Acceleration
Logarithmic quantization for Convolutional Neural Networks (CNN): a) fits well typical weights and activation distributions, and b) allows the replacement of the multiplication operation by a shift operation that can be implemented with fewer hardware resources. We propose a new quantization method named Jumping Log Quantization (JLQ). The key idea of JLQ is to extend the quantization range, by adding a coefficient parameter “s” in the power of two exponents $(2^{sx+i})$. This quantization strategy skips some values from the standard logarithmic quantization. In addition, we also develop a small hardware-friendly optimization called weight de-zero. Zero-valued weights that cannot be performed by a single shift operation are all replaced with logarithmic weights to reduce hardware resources with almost no accuracy loss. To implement the Multiply-And-Accumulate (MAC) operation (needed to compute convolutions) when the weights are JLQ-ed and de-zeroed, a new Processing Element (PE) have been developed. This new PE uses a modified barrel shifter that can efficiently avoid the skipped values. Resource utilization, area, and power consumption of the new PE standing alone are reported. We have found that JLQ performs better than other state-of-the-art logarithmic quantization methods when the bit width of the operands becomes very small.
...
Logarithmic quantization for Convolutional Neural Networks (CNN): a) fits well typical weights and activation distributions, and b) allows the replacement of the multiplication operation by a shift operation that can be implemented with fewer hardware resources. We propose a new quantization method named Jumping Log Quantization (JLQ). The key idea of JLQ is to extend the quantization range, by adding a coefficient parameter “s” in the power of two exponents $(2^{sx+i})$. This quantization strategy skips some values from the standard logarithmic quantization. In addition, we also develop a small hardware-friendly optimization called weight de-zero. Zero-valued weights that cannot be performed by a single shift operation are all replaced with logarithmic weights to reduce hardware resources with almost no accuracy loss. To implement the Multiply-And-Accumulate (MAC) operation (needed to compute convolutions) when the weights are JLQ-ed and de-zeroed, a new Processing Element (PE) have been developed. This new PE uses a modified barrel shifter that can efficiently avoid the skipped values. Resource utilization, area, and power consumption of the new PE standing alone are reported. We have found that JLQ performs better than other state-of-the-art logarithmic quantization methods when the bit width of the operands becomes very small.
Conference paper
(2022)
-
D. Aledo Ortega, T. Manjunath, R.T. Rajan, Darek Maksimiuk, T.G.R.M. van Leuken
In multi-sensor systems, several sensors produce data streams, commonly, at different frequencies. If they are let running wild without synchronization, after a period of time, they are likely to be disordered, presenting as simultaneous measures that have been recorded at different times. That can be disastrous in many data fusion applications. This paper is about their temporal synchronization and ordering, so they can be coherently fused. Some sensors do not have timestamps from which order the streams, and even if they have, they may be not trustable for different reasons. First, we define mathematically the problem of multi-sensor data stream synchronization. Then, we handle the problem of estimating the actual time of sensor measurement using mean or median filters. Next, we address the issue of reconstructing incoming sensor data streams according to the estimated sensor measurement times while maintaining minimal latency and synchronization error by employing an adaptive stream buffering technique utilized in distributed multimedia systems. In order to test our methods, we have recorded an easy-to-use dataset with a radar and a lidar sensors without timestamps. We define a synchronization event that is easily identifiable by a human annotator in both sensor streams. From this dataset, a suitable filter for timestamp estimation is selected, and an analysis of the effects of the stream synchronization algorithm’s parameters on buffering latency and synchronization error is presented. Finally, the solution is efficiently implemented on a FPGA
...
In multi-sensor systems, several sensors produce data streams, commonly, at different frequencies. If they are let running wild without synchronization, after a period of time, they are likely to be disordered, presenting as simultaneous measures that have been recorded at different times. That can be disastrous in many data fusion applications. This paper is about their temporal synchronization and ordering, so they can be coherently fused. Some sensors do not have timestamps from which order the streams, and even if they have, they may be not trustable for different reasons. First, we define mathematically the problem of multi-sensor data stream synchronization. Then, we handle the problem of estimating the actual time of sensor measurement using mean or median filters. Next, we address the issue of reconstructing incoming sensor data streams according to the estimated sensor measurement times while maintaining minimal latency and synchronization error by employing an adaptive stream buffering technique utilized in distributed multimedia systems. In order to test our methods, we have recorded an easy-to-use dataset with a radar and a lidar sensors without timestamps. We define a synchronization event that is easily identifiable by a human annotator in both sensor streams. From this dataset, a suitable filter for timestamp estimation is selected, and an analysis of the effects of the stream synchronization algorithm’s parameters on buffering latency and synchronization error is presented. Finally, the solution is efficiently implemented on a FPGA