Energy-Aware Vision Model Partitioning for Edge AI

None, None; None, None; None, None; None, None; None, None

Energy-Aware Vision Model Partitioning for Edge AI

Conference Paper (2025)

Author(s)

D. Katare (TU Delft - Information and Communication Technology)

Mengying Zhou (Fudan University)

Y. Chen (TU Delft - Computer Graphics and Visualisation, Fudan University)

M.F.W.H.A. Janssen (TU Delft - Engineering, Systems and Services)

Aaron Yi Ding (TU Delft - Information and Communication Technology)

Research Group

Information and Communication Technology

DOI related publication

https://doi.org/10.1145/3672608.3707792

Edge computing Energy-aware computing Model partition

To reference this document use:

https://resolver.tudelft.nl/uuid:f997f392-0f1d-4a82-9e51-51b3f4fdb4aa

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Information and Communication Technology

Pages (from-to)

671-678

ISBN (electronic)

9798400706295

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deploying scalable Vision Transformer applications on mobile and edge devices is constrained by limited memory and computational resources. Existing model development and deployment strategies include distributed computing and inference methods such as federated learning, split computing, collaborative inference and edge-cloud offloading mechanisms. While these strategies have deployment advantages, they fail to optimize memory usage and processing efficiency, resulting in increased energy consumption. This paper optimizes energy consumption by introducing adaptive model partitioning mechanisms and dynamic scaling methods for ViTs such as EfficientViT and TinyViT, adjusting model complexity based on the available computational resources and operating conditions. We implement energy-efficient strategies that minimize inter-layer communication for distributed machine learning across edge devices, thereby reducing energy consumption from data flow and computation. Our evaluations on a series of benchmark models show improvements, including up to a 32.6% reduction in latency and 16.6% energy savings, while maintaining mean average precision sacrifices within 2.5 to 4.5% of baseline models. These results show that our proposal is a practical approach for improving edge AI sustainability and efficiency.

Files

3672608.3707792.pdf

(pdf | 1.5 Mb)