EdgeTA

None, None; None, None; None, None; None, None; None, None; None, None

EdgeTA

Neuron-Grained Scaling of Foundation Models in Edge-Side Retraining

Journal Article (2025)

Author(s)

Qinglong Zhang (Beijing Institute of Technology)

Rui Han (Beijing Institute of Technology)

Chi Harold Liu (Beijing Institute of Technology)

Guoren Wang (Beijing Institute of Technology)

Song Guo (The Hong Kong University of Science and Technology)

Lydia Y. Chen (TU Delft - Data-Intensive Systems)

Research Group

Data-Intensive Systems

DOI related publication

https://doi.org/10.1109/TMC.2024.3504859

Resource scheduling Evolving data Foundation model Neuron-grained scaling Retraining

To reference this document use:

https://resolver.tudelft.nl/uuid:7151132d-8acc-48ef-af9f-a785c56b54e2

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Data-Intensive Systems

Bibliographical Note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. @en

Issue number

4

Volume number

24

Pages (from-to)

2690-2707

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Foundation models (FMs) such as large language models are becoming the backbone technology for artificial intelligence systems. It is particularly challenging to deploy multiple FMs on edge devices, which not only have limited computational resources, but also encounter unseen input data from evolving domains or learning tasks. When new data arrives, existing prior art of FM mainly focuses on retraining compressed models of predetermined network architectures, limiting the feasibility of edge devices to efficiently achieve high accuracy for FMs. In this paper, we propose EdgeTA, a neuron-grained FM scaling system to maximize the overall accuracy of FMs promptly in response to their data dynamics. EdgeTA's key design features in scaling are (i) proxy mechanism, which adaptively transforms a FM into a compact architecture retaining the most important neurons to the input data, and (ii) neuron-grained scheduler, which jointly optimizes model sizes and resource allocation for all FMs on edge devices. Under tight retraining window and limited device resources, the design of EdgeTA can achieve most of the original FM's accuracy with much smaller retraining costs. We implement EdgeTA on FMs of natural language processing, computer vision and multimodal applications. Comparison results against state-of-the-art techniques show that our approach improves accuracy by 21.88% and reduces memory footprint and energy consumptions by 27.14% and 65.65%, while further achieving 15.96% overall accuracy improvement via neuron-grained scheduling.

Files

EdgeTA_Neuron-Grained_Scaling_... (pdf)

(pdf | 8.02 Mb)

- Embargo expired in 02-06-2025

License info not available