Multi-inference on the Edge: Scheduling Networks with Limited Available Memory

More Info
expand_more

Abstract

The execution of multi-inference tasks on low-powered edge devices has become increasingly popular in recent years for adding value to data on-device. The focus of the optimization of such jobs has been on hardware, neural network architectures, and frameworks to reduce execution speed. However, it is yet not known how different scheduling policies affect the execution speed of a multi-inference job. An empirical study has been performed to investigate the effects of scheduling policies on multi-inference. The execution performance information of multi-inference batch jobs under combinations of loading and scheduling policies were determined under varying levels of constrained memory. These results were obtained using EdgeCaffe: a framework developed to execute Caffe networks on edge oriented devices. Our research showed that a novel scheduling policy, MeMa, can significantly reduce execution speed under stringent memory availability. Overall, this study demonstrates that scheduling policies can significantly reduce the execution speed of multi-inference jobs.