Elastic-DF

None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None

Elastic-DF

Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning

Journal Article (2022)

Author(s)

Tobias Alonso (Universidad Autónoma de Madrid)

Lucian Petrica (Xilinx Research)

Mario Ruiz (Xilinx University Program)

Jakoba Petri-Koenig (TU Delft - Computer Engineering)

Yaman Umuroglu (Xilinx Research)

Ioannis Stamelos (InAccel)

Elias Koromilas (InAccel)

Michaela Blott (Xilinx Research)

Kees Vissers (Xilinx Research)

Research Group

Computer Engineering

DOI related publication

https://doi.org/10.1145/3470567

Deep neural networks Partitioning Distributed inference

To reference this document use:

https://resolver.tudelft.nl/uuid:6a2e4181-2cae-44f4-9ecd-e0ff53136406

More Info

expand_more

Publication Year

2022

Language

English

Research Group

Computer Engineering

Issue number

2

Volume number

15

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Customized compute acceleration in the datacenter is key to the wider roll-out of applications based on deep neural network (DNN) inference. In this article, we investigate how to maximize the performance and scalability of field-programmable gate array (FPGA)-based pipeline dataflow DNN inference accelerators (DFAs) automatically on computing infrastructures consisting of multi-die, network-connected FPGAs. We present Elastic-DF, a novel resource partitioning tool and associated FPGA runtime infrastructure that integrates with the DNN compiler FINN. Elastic-DF allocates FPGA resources to DNN layers and layers to individual FPGA dies to maximize the total performance of the multi-FPGA system. In the resulting Elastic-DF mapping, the accelerator may be instantiated multiple times, and each instance may be segmented across multiple FPGAs transparently, whereby the segments communicate peer-to-peer through 100 Gbps Ethernet FPGA infrastructure, without host involvement. When applied to ResNet-50, Elastic-DF provides a 44% latency decrease on Alveo U280. For MobileNetV1 on Alveo U200 and U280, Elastic-DF enables a 78% throughput increase, eliminating the performance difference between these cards and the larger Alveo U250. Elastic-DF also increases operating frequency in all our experiments, on average by over 20%. Elastic-DF therefore increases performance portability between different sizes of FPGA and increases the critical throughput per cost metric of datacenter inference.

Files

3470567.pdf

(pdf | 2.82 Mb)

- Embargo expired in 01-07-2023

License info not available