Hardware-informed reinforcement learning for quantum gate scheduling

Bachelor Thesis (2026)
Author(s)

J.K. Pietrzak (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Sebastian Feld – Mentor (TU Delft - QCD/Feld Group)

Akash Kundu – Mentor (TU Delft - QCD/Feld Group)

M.T.J. Spaan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Anna Lukina – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
26-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
10
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Quantum gate scheduling assigns start cycles to quantum-circuit operations while respecting precedence, resource, and hardware constraints. Although schedules are commonly evaluated by makespan, it is only an indirect proxy for execution reliability, since schedules of equal duration may differ in gate errors, idle-time decoherence, and crosstalk exposure. This thesis investigates whether reinforcement learning benefits from domain knowledge in quantum gate scheduling. Building on qgym’s scheduling environment, we evaluate Maskable Proximal Policy Optimization against greedy ASAP and ALAP baselines on Random, GHZ, QFT, and QAOA circuit families using IBM calibration data. We study commutation-awareness, which relaxes unnecessary ordering constraints between commuting gates, and hardware-awareness, which injects calibration data through extended observations and/or a log-ESP-based reward. The main finding is that commutation-awareness is the most reliable improvement: it reduces makespan by approximately 20% for QAOA and Random circuits, while giving little benefit for GHZ and QFT circuits. Furthermore, noise-aware observation space proves promising for further research.

Files

License info not available