Oikonomos-II+

a Reinforcement-Learning, Cloud Resource Recommender for HPC & AI Workloads

Journal Article (2026)
Author(s)

R. E.V. Betting (Erasmus MC)

Q. Chen (Erasmus MC)

C. I. De Zeeuw (Erasmus MC, Netherlands Institute for Neuroscience)

C. Strydis (Erasmus MC, TU Delft - Computer Engineering)

DOI related publication
https://doi.org/10.1109/TCC.2026.3682234 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Journal title
IEEE Transactions on Cloud Computing

Abstract

Oikonomos-II+ is a hybrid, reinforcement-learning system for recommending optimal cloud-instance types for HighPerformance Computing (HPC) and Artificial-Intelligence (AI) applications. Unlike existing approaches that require historical data or repeated job executions, Oikonomos-II+ learns online using user-submitted jobs. It combines a modified Neural-LinUCB algorithm with Gaussian-Process regression to model the relationship between job parameters, instance types, and execution time. This allows it to balance exploration and exploitation efficiently, even in the absence of prior data. We evaluated six configurations of Oikonomos-II+ on a diverse set of HPC and AI workloads, optimizing for cost and speed. Results show that the complete system converges to optimal resource choices, outperforming purely predictive or search-based approaches. By treating deployed applications as a black box and by eliminating the need for preexisting training data or auxiliary runs, Oikonomos-II+ provides a general-purpose, low-overhead solution for dynamic resource selection in heterogeneous cloud environments.