Analysis of Loading Policies for Multi-model Inference on the Edge

None, None

Analysis of Loading Policies for Multi-model Inference on the Edge

Bachelor Thesis (2020)

Author(s)

Y.F. Runhaar (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Y. Chen – Mentor (TU Delft - Data-Intensive Systems)

B.A. Cox – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Amirmasoud Ghiassi – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Deep learning Inference Edge Computing Edge Devices Memory Constrained

To reference this document use:

https://resolver.tudelft.nl/uuid:7cbf0d0c-8060-42ec-be94-d80b3b0fcad6

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Graduation Date

25-06-2020

Awarding Institution

Delft University of Technology

Project

Optimizing Multiple Deep Learning Models on Edge Devices

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The increasingly growing expansion of the Internet of Things (IoT) along with the convergence of multiple technologies such as the arrival of next generation wireless broadband in 5G, is creating a paradigm shift from cloud computing towards edge computing. Performing tasks normally done by the cloud directly on edge devices would ensure multiple benefits such as latency gains and a more robust privacy of data. However, edge devices are resource-constrained and often do not possess the computational and memory capabilities to perform demanding tasks. Complex algorithms such as the training and inference of a complete Deep Neural Network (DNN) is often not feasible on these devices.

In this paper we perform a novel empirical study of the various ways that multiple inference tasks of deep learning models can be loaded on these edge devices. We analyse the run time gain, under different resource limits, of various DNN layer loading policies that aim to optimize the overall run time of consecutive inference tasks. We combine this with further research in the memory usage and swapping behaviour when performing these inference tasks. Using these results, we show that if the memory overhead becomes too large, loading and executing DNN layers in an interleaved manner provides significant gains in run time. This is achieved trough multiple experiments in our specially made evaluation environment EdgeCaffe which is presented in this paper as well.

Files

Edge_Devices_Research_Paper.pd... (pdf)

(pdf | 1.36 Mb)

License info not available