DNN-porting for maximizing inference hardware utilisation at the Edge

Master Thesis (2023)
Author(s)

J.L. Buijnsters (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jan S. Rellermeyer – Mentor (Leibniz University of Hannover)

Aaron Ding – Graduation committee member (TU Delft - Information and Communication Technology)

P. Pawełczak – Graduation committee member (TU Delft - Embedded Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Jan Buijnsters
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Jan Buijnsters
Graduation Date
26-06-2023
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Artificial Intelligence']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Industry 4.0 and the Industrial Internet of Things (IIoT) growth will result in an explosion of data generated by connected devices. Adapting 5G and 6G technology could be the leading enabler of the broad possibilities of connecting IIoT devices in masses. However, the edge solution has some disadvantages, such as the loss of resource elasticity compared to cloud solutions. The research questions of this thesis are whether Deep Neural Networks (DNN)-porting can solve the accuracy-performance trade-off of edge computing solutions and how to implement an edge computing system based on open-source container-orchestrated DNN model inference platforms to enable vertical model autoscaling capabilities.
The thesis shows how porting techniques like structured pruning on DNN enable the accuracy performance trade-off in hardware-constrained settings. It generates models with reduced complexity and size while minimally degrading the accuracy. By using these ported models in the proposed inference platform, the thesis demonstrates how an edge computing system can achieve vertical model autoscaling capabilities, enabling efficient use of computational resources. This research focuses on CPU hardware and Real-Time (RT) request scenarios, where the latency Service Level Objective (SLO) combined with current demand are crucial factors. When the resources in an inference system deplete, the latency of individual requests can increase significantly due to queuing. The results show how an orchestrator can make live model version selections based on the model versions and demand. The proposed system increases the maximum possible throughput compared to the state-of-the-art while avoiding creating a queue in the RT scenario and improving system accuracy when CPU resources are available. Additionally, this work proposes a design to implement these benefits in industry-adopted open-source DNN inference platforms.

Files

License info not available