Energy-Aware Vision Model Partitioning for Edge AI

Conference Paper (2025)
Author(s)

D. Katare (TU Delft - Information and Communication Technology)

Mengying Zhou (Fudan University)

Y. Chen (TU Delft - Computer Graphics and Visualisation, Fudan University)

M.F.W.H.A. Janssen (TU Delft - Engineering, Systems and Services)

Aaron Ding (TU Delft - Information and Communication Technology)

Research Group
Information and Communication Technology
DOI related publication
https://doi.org/10.1145/3672608.3707792
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Information and Communication Technology
Pages (from-to)
671-678
ISBN (electronic)
9798400706295
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deploying scalable Vision Transformer applications on mobile and edge devices is constrained by limited memory and computational resources. Existing model development and deployment strategies include distributed computing and inference methods such as federated learning, split computing, collaborative inference and edge-cloud offloading mechanisms. While these strategies have deployment advantages, they fail to optimize memory usage and processing efficiency, resulting in increased energy consumption. This paper optimizes energy consumption by introducing adaptive model partitioning mechanisms and dynamic scaling methods for ViTs such as EfficientViT and TinyViT, adjusting model complexity based on the available computational resources and operating conditions. We implement energy-efficient strategies that minimize inter-layer communication for distributed machine learning across edge devices, thereby reducing energy consumption from data flow and computation. Our evaluations on a series of benchmark models show improvements, including up to a 32.6% reduction in latency and 16.6% energy savings, while maintaining mean average precision sacrifices within 2.5 to 4.5% of baseline models. These results show that our proposal is a practical approach for improving edge AI sustainability and efficiency.