DNN-porting for maximizing inference hardware utilisation at the Edge

None, None

DNN-porting for maximizing inference hardware utilisation at the Edge

Master Thesis (2023)

Author(s)

J.L. Buijnsters (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jan S. Rellermeyer – Mentor (Leibniz University of Hannover)

Aaron Ding – Graduation committee member (TU Delft - Information and Communication Technology)

P. Pawełczak – Graduation committee member (TU Delft - Embedded Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

DNN-porting Intelligent Edge Edge AI Model-switching Vertical-scaling Hardware-constrained AI

To reference this document use:

https://resolver.tudelft.nl/uuid:31141c48-85b1-45cd-93ac-ad6c6fe35ff0

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

26-06-2023

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Artificial Intelligence']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Industry 4.0 and the Industrial Internet of Things (IIoT) growth will result in an explosion of data generated by connected devices. Adapting 5G and 6G technology could be the leading enabler of the broad possibilities of connecting IIoT devices in masses. However, the edge solution has some disadvantages, such as the loss of resource elasticity compared to cloud solutions. The research questions of this thesis are whether Deep Neural Networks (DNN)-porting can solve the accuracy-performance trade-off of edge computing solutions and how to implement an edge computing system based on open-source container-orchestrated DNN model inference platforms to enable vertical model autoscaling capabilities.
The thesis shows how porting techniques like structured pruning on DNN enable the accuracy performance trade-off in hardware-constrained settings. It generates models with reduced complexity and size while minimally degrading the accuracy. By using these ported models in the proposed inference platform, the thesis demonstrates how an edge computing system can achieve vertical model autoscaling capabilities, enabling efficient use of computational resources. This research focuses on CPU hardware and Real-Time (RT) request scenarios, where the latency Service Level Objective (SLO) combined with current demand are crucial factors. When the resources in an inference system deplete, the latency of individual requests can increase significantly due to queuing. The results show how an orchestrator can make live model version selections based on the model versions and demand. The proposed system increases the maximum possible throughput compared to the state-of-the-art while avoiding creating a queue in the RT scenario and improving system accuracy when CPU resources are available. Additionally, this work proposes a design to implement these benefits in industry-adopted open-source DNN inference platforms.

Files

Thesis_DNN_at_the_edge_jlbuijn... (pdf)

(pdf | 29.3 Mb)

License info not available