Compact transformer variants for synthetic time series forecasting

None, None; None, None

Compact transformer variants for synthetic time series forecasting

Journal Article (2026)

Author(s)

Ali Forootani (Max Planck Institute of Geoanthropology, UZF - Helmholtz Centre for Environmental Research)

Mohammad Khosravi (TU Delft - Mechanical Engineering)

Research Group

Team Khosravi

Time series forecasting Koopman operator Informer Autoformer PatchTST Transformer models

DOI related publication

https://doi.org/10.1016/j.neucom.2026.133140 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:fd10723d-7dd8-4383-b119-5ce7e8a2a368

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Team Khosravi

Journal title

Neurocomputing

Volume number

677

Article number

133140

Downloads counter

42

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper presents a unified and systematic study of compact Transformer architectures for time series forecasting. We introduce a modular framework that standardizes three widely used Transformer families—Autoformer, Informer, and PatchTST—into three principled architectural variants: Minimal, Standard, and Full, enabling controlled analysis of model capacity, inductive bias, and computational complexity. For each family, we provide consistent mathematical formulations, layer-wise descriptions, and end-to-end complexity characterizations. We conduct over 1500 controlled experiments on ten synthetic time series under varying patch lengths, forecast horizons, and noise levels. The results reveal clear and reproducible performance regimes: PatchTST Standard achieves the best overall accuracy and noise robustness, Autoformer variants excel on smooth and trend-dominated signals, and Informer variants exhibit sensitivity to noise and long horizons despite improved scalability. Complementing the empirical analysis, we derive new theoretical results that quantify noise attenuation, bias–variance trade-offs, and approximation–complexity guarantees specific to each architectural family. Finally, we demonstrate that these compact Transformer variants serve as effective and interpretable temporal encoders within an operator–theoretic forecasting framework. By embedding Autoformer, Informer, and PatchTST backbones into a Koopman-based latent dynamics model, we extend their applicability beyond synthetic benchmarks to real-world climate, cryptocurrency and electricity generation time series. Together, these results position compact, modular Transformers as scalable and theoretically grounded building blocks for scientific time series forecasting.

Files

1-s2.0-S0925231226005370-main.... (pdf)

(pdf | 10.5 Mb)