Energy-Aware Vision Model Partitioning for Edge AI

Conference Paper (2025)
Author(s)

Dewant Katare (TU Delft - Technology, Policy and Management)

Mengying Zhou (Fudan University)

Yang Chen (TU Delft - Electrical Engineering, Mathematics and Computer Science, Fudan University)

Marijn Janssen (TU Delft - Technology, Policy and Management)

Aaron Yi Ding (TU Delft - Technology, Policy and Management)

Research Group
Information and Communication Technology
DOI related publication
https://doi.org/10.1145/3672608.3707792 Final published version
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Information and Communication Technology
Pages (from-to)
671-678
Publisher
ACM
ISBN (electronic)
9798400706295
Event
40th Annual ACM Symposium on Applied Computing, SAC 2025 (2025-03-31 - 2025-04-04), Catania, Italy
Downloads counter
264
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deploying scalable Vision Transformer applications on mobile and edge devices is constrained by limited memory and computational resources. Existing model development and deployment strategies include distributed computing and inference methods such as federated learning, split computing, collaborative inference and edge-cloud offloading mechanisms. While these strategies have deployment advantages, they fail to optimize memory usage and processing efficiency, resulting in increased energy consumption. This paper optimizes energy consumption by introducing adaptive model partitioning mechanisms and dynamic scaling methods for ViTs such as EfficientViT and TinyViT, adjusting model complexity based on the available computational resources and operating conditions. We implement energy-efficient strategies that minimize inter-layer communication for distributed machine learning across edge devices, thereby reducing energy consumption from data flow and computation. Our evaluations on a series of benchmark models show improvements, including up to a 32.6% reduction in latency and 16.6% energy savings, while maintaining mean average precision sacrifices within 2.5 to 4.5% of baseline models. These results show that our proposal is a practical approach for improving edge AI sustainability and efficiency.