EdgeTA

Neuron-Grained Scaling of Foundation Models in Edge-Side Retraining

Journal Article (2025)
Author(s)

Qinglong Zhang (Beijing Institute of Technology)

Rui Han (Beijing Institute of Technology)

Chi Harold Liu (Beijing Institute of Technology)

Guoren Wang (Beijing Institute of Technology)

Song Guo (The Hong Kong University of Science and Technology)

Lydia Y. Chen (TU Delft - Data-Intensive Systems)

Research Group
Data-Intensive Systems
DOI related publication
https://doi.org/10.1109/TMC.2024.3504859
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Data-Intensive Systems
Bibliographical Note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. @en
Issue number
4
Volume number
24
Pages (from-to)
2690-2707
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Foundation models (FMs) such as large language models are becoming the backbone technology for artificial intelligence systems. It is particularly challenging to deploy multiple FMs on edge devices, which not only have limited computational resources, but also encounter unseen input data from evolving domains or learning tasks. When new data arrives, existing prior art of FM mainly focuses on retraining compressed models of predetermined network architectures, limiting the feasibility of edge devices to efficiently achieve high accuracy for FMs. In this paper, we propose EdgeTA, a neuron-grained FM scaling system to maximize the overall accuracy of FMs promptly in response to their data dynamics. EdgeTA's key design features in scaling are (i) proxy mechanism, which adaptively transforms a FM into a compact architecture retaining the most important neurons to the input data, and (ii) neuron-grained scheduler, which jointly optimizes model sizes and resource allocation for all FMs on edge devices. Under tight retraining window and limited device resources, the design of EdgeTA can achieve most of the original FM's accuracy with much smaller retraining costs. We implement EdgeTA on FMs of natural language processing, computer vision and multimodal applications. Comparison results against state-of-the-art techniques show that our approach improves accuracy by 21.88% and reduces memory footprint and energy consumptions by 27.14% and 65.65%, while further achieving 15.96% overall accuracy improvement via neuron-grained scheduling.

Files

EdgeTA_Neuron-Grained_Scaling_... (pdf)
(pdf | 8.02 Mb)
- Embargo expired in 02-06-2025
License info not available