BZ

B. Zhu

info

Please Note

6 records found

Conference paper (2021) - Baozhou Zhu, Peter Hofstee, Jinho Lee, Zaid Al-Ars
Attention mechanism has been regarded as an advanced technique to capture long-range feature interactions and to boost the representation capability for convolutional neural networks. However, we found two ignored problems in current attentional activations-based models: the approximation problem and the insufficient capacity problem of the attention maps. To solve the two problems together, we initially propose an attention module for convolutional neural networks by developing an AW-convolution, where the shape of attention maps matches that of the weights rather than the activations. Our proposed attention module is a complementary method to previous attention-based schemes, such as those that apply the attention mechanism to explore the relationship between channel-wise and spatial features. Experiments on several datasets for image classification and object detection tasks show the effectiveness of our proposed attention module. In particular, our proposed attention module achieves 1.00 % Top-1 accuracy improvement on ImageNet classification over a ResNet101 baseline and 0.63 COCO-style Average Precision improvement on the COCO object detection on top of a Faster R-CNN baseline with the backbone of ResNet101-FPN. When integrating with the previous attentional activations-based models, our proposed attention module can further increase their Top-1 accuracy on ImageNet classification by up to 0.57 % and COCO-style Average Precision on the COCO object detection by up to 0.45. Code and pre-trained models will be publicly available. ...
Doctoral thesis (2021) - B. Zhu
In recent years, the accuracy of Deep Neural Networks (DNNs) has improved significantly because of three main factors: the availability of massive amounts training data, the introduction of powerful low-cost computational resources, and the development of complex deep learning models. The cloud can provide powerful computational resources to calculate DNNs but limits their deployment due to data communication and privacy issues. Thus, computing DNNs at the edge is becoming an important alternative to calculating these models in a centralized service. However, there is a mismatch between the resource-constrained devices at the edge and the models with increased computational complexity. To alleviate this mismatch, both the algorithms and hardware need to be explored to improve the efficiency of training various feedforward and recurrent neural networks and inferring using a DNN. ...

Neural Architecture Search for Binary Convolutional Neural Networks

Conference paper (2020) - Baozhou Zhu, Zaid Al-Ars, H. Peter Hofstee
Binary Convolutional Neural Networks (CNNs) have significantly reduced the number of arithmetic operations and the size of memory storage needed for CNNs, which makes their deployment on mobile and embedded systems more feasible. However, after binarization, the CNN architecture has to be redesigned and refined significantly due to two reasons: 1. the large accumulation error of binarization in the forward propagation, and 2. the severe gradient mismatch problem of binarization in the backward propagation. Even though substantial effort has been invested in designing architectures for single and multiple binary CNNs, it is still difficult to find an optimized architecture for binary CNNs. In this paper, we propose a strategy, named NASB, which adapts Neural Architecture Search (NAS) to find an optimized architecture for the binarization of CNNs. In the NASB strategy, the operations and their connections define a unique searching space and the training and binarization of the network progress in the three-stage training algorithm. 1 Due to the flexibility of this automated strategy, the obtained architecture is not only suitable for binarization but also has low overhead, achieving a better trade-off between the accuracy and computational complexity compared to hand-optimized binary CNNs. The implementation of the NASB strategy is evaluated on the ImageNet dataset and demonstrated as a better solution compared to existing quantized CNNs. With insignificant overhead increase, NASB outperforms existing single and multiple binary CNNs by up to 4.0% and 1.0% Top-1 accuracy respectively, bringing them closer to the precision of their full precision counterpart. ...

Reducing approximation of channels by reducing feature reuse within convolution

Journal article (2020) - Baozhou Zhu, Zaid Al-Ars, H. Peter Hofstee
High-level feature maps of Convolutional Neural Networks are computed by reusing their corresponding low-level feature maps, which brings into full play feature reuse to improve the computational efficiency. This form of feature reuse is referred to as feature reuse between convolutional layers. The second type of feature reuse is referred to as feature reuse within the convolution, where the channels of the output feature maps of the convolution are computed by reusing the same channels of the input feature maps, which results in an approximation of the channels of the output feature maps. To compute them accurately, we need specialized input feature maps for every channel of the output feature maps. In this paper, we first discuss the approximation problem introduced by full feature reuse within the convolution and then propose a new feature reuse scheme called Reducing Approximation of channels by Reducing Feature reuse (REAF). The paper also shows that group convolution is a special case of our REAF scheme and we analyze the advantage of REAF compared to such group convolution. Moreover, we develop the REAF+ scheme and integrate it with group convolution-based models. Compared with baselines, experiments on image classification demonstrate the effectiveness of our REAF and REAF+ schemes. Under the given computational complexity budget, the Top-1 accuracy of REAF-ResNet50 and REAF+-MobileNetV2 on ImageNet will increase by 0.37% and 0.69% respectively. The code and pre-trained models will be publicly available. ...
Book chapter (2020) - Baozhou Zhu, Zaid Al-Ars, Wei Pan
Binary Convolutional Neural Networks (CNNs) can significantly reduce the number of arithmetic operations and the size of memory storage, which makes the deployment of CNNs on mobile or embedded systems more promising. However, the accuracy degradation of single and multiple binary CNNs is unacceptable for modern architectures and large scale datasets like ImageNet. In this paper, we proposed a Piecewise Approximation (PA) scheme for multiple binary CNNs which lessens accuracy loss by approximating full precision weights and activations efficiently, and maintains parallelism of bitwise operations to guarantee efficiency. Unlike previous approaches, the proposed PA scheme segments piece-wisely the full precision weights and activations, and approximates each piece with a scaling coefficient. Our implementation on ResNet with different depths on ImageNet can reduce both Top-1 and Top-5 classification accuracy gap compared with full precision to approximately 1.0%. Benefited from the binarization of the downsampling layer, our proposed PA-ResNet50 requires less memory usage and two times Flops than single binary CNNs with 4 weights and 5 activations bases. The PA scheme can also generalize to other architectures like DenseNet and MobileNet with similar approximation power as ResNet which is promising for other tasks using binary convolutions. The code and pretrained models will be publicly available. ...
Convolutional Neural Networks (CNNs) are a class of widely used deep artificial neural networks. However, training large CNNs to produce state-of-the-art results can take a long time. In addition, we need to reduce compute time of the inference stage for trained networks to make it accessible for real time applications. In order to achieve this, integer number formats INT8 and INT16 with reduced precision are being used to create Integer Convolutional Neural Networks (ICNNs) to allow them to be deployed on mobile devices or embedded systems. In this paper, Diminished-l Fermat Number Transform (DFNT), which refers to Fermat Number Transform (FNT) with diminished-l number representation, is proposed to accelerate ICNNs through algebraic properties of integer convolution. This is achieved by performing the convolution step as diminished -1 point-wise products between DFNT transformed feature maps, which can be reused multiple times in the calculation. Since representing and computing all the integers in the ring of integers modulo Fermat number 2 {b}+1 for FNT requires b+1 bits, diminished-1 number representation is used to enable exact and efficient calculation. Using DFNT, integer convolution is implemented on a general purpose processor, showing speedup of 2-3x with typical parameter configurations and better scalability without any round-off error compared to the baseline. ...