Elastic-DF

Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning

Journal Article (2022)
Author(s)

Tobias Alonso (Universidad Autónoma de Madrid)

Lucian Petrica (Xilinx Research)

Mario Ruiz (Xilinx University Program)

Jakoba Petri-Koenig (TU Delft - Computer Engineering)

Yaman Umuroglu (Xilinx Research)

Ioannis Stamelos (InAccel)

Elias Koromilas (InAccel)

Michaela Blott (Xilinx Research)

Kees Vissers (Xilinx Research)

Research Group
Computer Engineering
DOI related publication
https://doi.org/10.1145/3470567
More Info
expand_more
Publication Year
2022
Language
English
Research Group
Computer Engineering
Issue number
2
Volume number
15
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Customized compute acceleration in the datacenter is key to the wider roll-out of applications based on deep neural network (DNN) inference. In this article, we investigate how to maximize the performance and scalability of field-programmable gate array (FPGA)-based pipeline dataflow DNN inference accelerators (DFAs) automatically on computing infrastructures consisting of multi-die, network-connected FPGAs. We present Elastic-DF, a novel resource partitioning tool and associated FPGA runtime infrastructure that integrates with the DNN compiler FINN. Elastic-DF allocates FPGA resources to DNN layers and layers to individual FPGA dies to maximize the total performance of the multi-FPGA system. In the resulting Elastic-DF mapping, the accelerator may be instantiated multiple times, and each instance may be segmented across multiple FPGAs transparently, whereby the segments communicate peer-to-peer through 100 Gbps Ethernet FPGA infrastructure, without host involvement. When applied to ResNet-50, Elastic-DF provides a 44% latency decrease on Alveo U280. For MobileNetV1 on Alveo U200 and U280, Elastic-DF enables a 78% throughput increase, eliminating the performance difference between these cards and the larger Alveo U250. Elastic-DF also increases operating frequency in all our experiments, on average by over 20%. Elastic-DF therefore increases performance portability between different sizes of FPGA and increases the critical throughput per cost metric of datacenter inference.

Files

3470567.pdf
(pdf | 2.82 Mb)
- Embargo expired in 01-07-2023
License info not available