On Scheduling Early-Exit Layers for Model Pipeline in 6G-Based Edge Inference

Journal Article (2025)
Author(s)

Yuxiao Liu (Beijing Institute of Technology)

Rui Han (Beijing Institute of Technology)

Qinglong Zhang (Beijing Institute of Technology)

Haiting Hou (Beijing Institute of Technology)

Chi Harold Liu (Beijing Institute of Technology)

Lydia Y. Chen (TU Delft - Data-Intensive Systems)

Research Group
Data-Intensive Systems
DOI related publication
https://doi.org/10.1109/MNET.2024.3520555
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Data-Intensive Systems
Issue number
5
Volume number
39
Pages (from-to)
131-137
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

When running edge intelligence applications with 6G networks, model pipeline effectively reduces inference latency via parallelizing layers across multiple edge devices. Today’s edge inference systems usually employ static architecture of layers in pipeline parallelism but dynamically skip part of layers in early-exit, which may significantly degrade system throughput. In this paper, we introduce DensePipe, an online layer scheduling approach that optimally allocates early-exit layers to edge devices to maximize their throughput in model pipeline. To this end, DensePipe profiles all network layers’ skipping probabilities in early-exit. At run-time, DensePipe maximizes the pipeline throughput by balancing the processing of all unskipped layers among devices according to the current loads and device resource utilizations. We implement DensePipe with Transformer models and demonstrate its effectiveness against state-of-the-art pipeline methods. Comparative experiments show that DensePiple successfully finds the best devices for most of the layers and significantly improves throughput by 3.09x.

Files

On_Scheduling_Early-Exit_Layer... (pdf)
(pdf | 1.44 Mb)
- Embargo expired in 16-01-2026
Taverne