Compact transformer variants for synthetic time series forecasting
Ali Forootani (Max Planck Institute of Geoanthropology, UZF - Helmholtz Centre for Environmental Research)
Mohammad Khosravi (TU Delft - Mechanical Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This paper presents a unified and systematic study of compact Transformer architectures for time series forecasting. We introduce a modular framework that standardizes three widely used Transformer families—Autoformer, Informer, and PatchTST—into three principled architectural variants: Minimal, Standard, and Full, enabling controlled analysis of model capacity, inductive bias, and computational complexity. For each family, we provide consistent mathematical formulations, layer-wise descriptions, and end-to-end complexity characterizations. We conduct over 1500 controlled experiments on ten synthetic time series under varying patch lengths, forecast horizons, and noise levels. The results reveal clear and reproducible performance regimes: PatchTST Standard achieves the best overall accuracy and noise robustness, Autoformer variants excel on smooth and trend-dominated signals, and Informer variants exhibit sensitivity to noise and long horizons despite improved scalability. Complementing the empirical analysis, we derive new theoretical results that quantify noise attenuation, bias–variance trade-offs, and approximation–complexity guarantees specific to each architectural family. Finally, we demonstrate that these compact Transformer variants serve as effective and interpretable temporal encoders within an operator–theoretic forecasting framework. By embedding Autoformer, Informer, and PatchTST backbones into a Koopman-based latent dynamics model, we extend their applicability beyond synthetic benchmarks to real-world climate, cryptocurrency and electricity generation time series. Together, these results position compact, modular Transformers as scalable and theoretically grounded building blocks for scientific time series forecasting.