Spatio-Temporal Transformer for Load Estimation using EMG and IMU in Assistive Robotics
B.C. Wingen (TU Delft - Mechanical Engineering)
A.H.A. Stienen – Mentor (TU Delft - Mechanical Engineering)
X. Zhang – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Intuitive control of assistive robotic devices, such as exoskeletons and arm supports, requires inferring the user’s interaction with objects in the environment. Surface electromyography (EMG) and inertial measurement units (IMU) provide complementary information about muscle activation and limb kinematics, but interpreting these sensory modalities for real-time control remains challenging. Deep learning is effective for modeling human motion intention, but has seen limited use in estimating the handheld load during object manipulation. This paper proposes a sensor-fused spatio-temporal transformer (ST-Transformer) that regresses the handheld load from synchronized EMG and IMU signals, together with a real-time acquisition and processing pipeline for an arm support device. Data were used from 17 participants performing a weight-movement task spanning six weight classes (0−6kg). EMG and IMU normalization, dataset-balancing augmentation, dropout, and weight decay were applied to improve cross-participant generalization. Trained and tested on the same participants, the sensor-fused model estimated load accurately (all metrics participant-class-balanced; R2 =0.935, MAE = 0.316kg, RMSE = 0.441kg) and significantly outperformed an EMG-only model (R2 = 0.913, MAE = 0.380kg, RMSE = 0.520kg). Under Leave-One-Participant-Out (LOPO) cross-validation, however, the fused model (R2 = 0.853, MAE = 0.536kg, RMSE = 0.680kg) retained only a slight, statistically non-significant edge over EMG alone (R2 =0.839, MAE =0.546kg, RMSE =0.703kg), while the IMU-only model degraded sharply. This indicates that the transferable load information is carried primarily by muscle activation, while the complementary IMU contribution is largely entangled with participant-specific characteristics. An attribution analysis localizes the load-relevant signal to the forearm muscles, indicating that a compact forearm-worn sensor set captures most of the usable signal, and the model (approximately 1.03 106 parameters) is feasible for real-time on-device inference on current microcontrollers.