Ultra low latency deep neural network inference for gravitational waves interferometer

*

More Info
expand_more

Abstract

Low latency Convolutional Neural Network (CNN) inference research is gaining more and more momentum for tasks such as speech and image classications. This is because CNNs have the ability to surpass human accuracy in classication of images. For improving the measurement setup of gravitational waves, low latency CNNs inference are researched. The CNN needs to process data from images to enable certain automatic controls for the control system. The data of these images need to be processed within 0.1 ms from the moment of taking the image to the control system obtaining the result of a deep neural network. Hardware acceleration is needed to reduce the execution latency of the network and reach the 0.1 ms requirement. Field-Programmable Gate Arrays (FPGAs) in particular have the ability to provide the needed acceleration. This is because FPGAs have the ability to create highly customised layers of the network and obtain the lowest possible latency. To reduce the design eort and complexity of the machine learning design, Xilinx introduced the FINN (Fast, Scalable Quantized Neural Network Inference on FPGAs) framework. FINN is an end-to-end deep learning framework that generates data ow-style architectures customised for each network. To establish if FINN can create the required ultra low latency CNN, some of FINN pretrained networks are used. The rst neural network investigated is the Tiny Fully Connected (TFC) network. The TFC network is a multilayer perceptron (MLP) for MNIST classication with three fully connected layers. The other network investigated is the convolutional neural network named CNV. CNV is a derivative of the VGG16 topology. The VGG16 topology is used for deep learning image classication problems with multiple convolutional layers. By using the analysis tools included with FINN, it can be determined if FINN is able to create the required ultra low latency CNN. The TFC network can be parallelised to a total of 5 expected cycles, with 1 expected cycle per layer. One cycle for the input quantization to standalone thresholding, one for the output layer and nally three for the fully connected layers. For the CNV network on the other hand, the initial convolution layer is unable to go below 8196 expected cycles, because of certain bottlenecks with FINN. These bottlenecks occur because of how FINN implements certain layers, moreover because certain layers can simply no longer be parallelised to lower the latency of that layer. To see if the CNV could achieve the latency requirements, a software emulation of the execution of the network has been done. This emulation showcased that by continuously increasing the parallelisation parameters, together with increasing clock frequencies, it is possible to create an ultra low latency pipeline of a CNN. This conguration has 45866 expected total cycles for the network, its expected cycles for the slowest layer is 8196 and needs a minimum frequency of 200 MHz. With those conguration it is possible to create a pipeline that has a latency of lower than 0.1 ms.