Hardware-informed reinforcement learning for quantum gate scheduling

None, None

Hardware-informed reinforcement learning for quantum gate scheduling

Bachelor Thesis (2026)

Author(s)

J.K. Pietrzak (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Sebastian Feld – Mentor (TU Delft - QCD/Feld Group)

Akash Kundu – Mentor (TU Delft - QCD/Feld Group)

M.T.J. Spaan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Anna Lukina – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Quantum computing Quantum compilation

To reference this document use

https://resolver.tudelft.nl/uuid:52c68e87-dbaa-4cb6-8ad3-fd8b35575a92

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

26-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

10

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Quantum gate scheduling assigns start cycles to quantum-circuit operations while respecting precedence, resource, and hardware constraints. Although schedules are commonly evaluated by makespan, it is only an indirect proxy for execution reliability, since schedules of equal duration may differ in gate errors, idle-time decoherence, and crosstalk exposure. This thesis investigates whether reinforcement learning benefits from domain knowledge in quantum gate scheduling. Building on qgym’s scheduling environment, we evaluate Maskable Proximal Policy Optimization against greedy ASAP and ALAP baselines on Random, GHZ, QFT, and QAOA circuit families using IBM calibration data. We study commutation-awareness, which relaxes unnecessary ordering constraints between commuting gates, and hardware-awareness, which injects calibration data through extended observations and/or a log-ESP-based reward. The main finding is that commutation-awareness is the most reliable improvement: it reduces makespan by approximately 20% for QAOA and Random circuits, while giving little benefit for GHZ and QFT circuits. Furthermore, noise-aware observation space proves promising for further research.

Files

Jakub_Pietrzak_-Bachelor_thesi... (pdf)

(pdf | 3.52 Mb)

License info not available