Compact transformer variants for synthetic time series forecasting

Journal Article (2026)
Author(s)

Ali Forootani (Max Planck Institute of Geoanthropology, UZF - Helmholtz Centre for Environmental Research)

Mohammad Khosravi (TU Delft - Mechanical Engineering)

Research Group
Team Khosravi
DOI related publication
https://doi.org/10.1016/j.neucom.2026.133140 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Team Khosravi
Journal title
Neurocomputing
Volume number
677
Article number
133140
Downloads counter
42
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper presents a unified and systematic study of compact Transformer architectures for time series forecasting. We introduce a modular framework that standardizes three widely used Transformer families—Autoformer, Informer, and PatchTST—into three principled architectural variants: Minimal, Standard, and Full, enabling controlled analysis of model capacity, inductive bias, and computational complexity. For each family, we provide consistent mathematical formulations, layer-wise descriptions, and end-to-end complexity characterizations. We conduct over 1500 controlled experiments on ten synthetic time series under varying patch lengths, forecast horizons, and noise levels. The results reveal clear and reproducible performance regimes: PatchTST Standard achieves the best overall accuracy and noise robustness, Autoformer variants excel on smooth and trend-dominated signals, and Informer variants exhibit sensitivity to noise and long horizons despite improved scalability. Complementing the empirical analysis, we derive new theoretical results that quantify noise attenuation, bias–variance trade-offs, and approximation–complexity guarantees specific to each architectural family. Finally, we demonstrate that these compact Transformer variants serve as effective and interpretable temporal encoders within an operator–theoretic forecasting framework. By embedding Autoformer, Informer, and PatchTST backbones into a Koopman-based latent dynamics model, we extend their applicability beyond synthetic benchmarks to real-world climate, cryptocurrency and electricity generation time series. Together, these results position compact, modular Transformers as scalable and theoretically grounded building blocks for scientific time series forecasting.