This thesis addresses profile-based optimization of longitudinal beam dynamics in the CERN Proton Synchrotron Booster (PSB) at two levels of complexity. In the 1D case (double-harmonic), the task is to infer and correct the second-harmonic phase φ2 to achieve optimal b
...
This thesis addresses profile-based optimization of longitudinal beam dynamics in the CERN Proton Synchrotron Booster (PSB) at two levels of complexity. In the 1D case (double-harmonic), the task is to infer and correct the second-harmonic phase φ2 to achieve optimal bunch lengthening throughout the acceleration cycle. The operational diversity of longitudinal profiles in this regime required a supervised learning dataset to cover representative conditions. In the 5D case (triple-harmonic), the goal is to jointly optimize V1, V2, V3, φ2, φ3 to obtain a flat-topped, lengthened bunch under stability constraints. Prior Bayesian Optimization (BO) identified a Kullback–Leibler (KL) divergence on the central charge region as a usable objective, removing the need for supervised labels in 5D.
For the double-harmonic optimization, a Convolutional Neural Network (CNN) with Convolutional Block Attention Modules (CBAM) was developed to map measured profiles directly to φ2 corrections as a single-shot regressor using a pre-made dataset of simulated profiles. Training leveraged cosine loss, intensity-agnostic max-normalization, stratified training-validation splits, and realistic profile augmentation. A double-layered hyperparameter optimization was performed with the BO framework Optuna using Tree of Parzen Estimators (TPE), with feature importance via Random Forest-based fANOVA and MDI. When applied iteratively, the single-shot regressor lacked an internal notion of convergence, so a decaying-corrections scheme tempered updates. In simulation and on-machine validation in the PSB, CBAM produced strong approximations to operator-tuned phases within sub–super-cycle latency, at times exceeding manual phasing performance in 10 iterations (\~5 minutes in the PSB). However, its sensitivity to initialization conditions, and limited corrective amplitude under persistent noise motivated a Reinforcement Learning (RL) alternative with better compatibility to the unlabelled 5D case.
A recurrent Long Short-Term Memory–Twin Delayed Deep Deterministic policy gradient (LSTM–TD3) framework was introduced to enable profile-based, continuous iterative control without supervised targets. Methodological advances included a learnable soft-threshold gate on profiles (actor and critic), Prioritized Experience Replay (PER), action bounding, and twin critics trained with the Huber loss instead of Mean Squared Error to hedge against Q-network overestimation bias amplified by PER. To our knowledge, this is the only open-source LSTM–TD3 implementation that combines this loss choice with parallelized environments and PER.
For the φ2 phasing problem, the RL agent used a simple phase-centric reward focusing on convergence and beam-loss prevention. The optimised agent achieved the best simulated performance when benchmarked against CBAM, producing better φ2 phasing in simulation in fewer iterations with enhanced robustness against different impedances and initial conditions. Due to time constraints, only an earlier, unoptimised variant was validated in the PSB; its behaviour (including over-corrections over the cycle and dependence on initialization) matched verification expectations and was used to extrapolate the optimised agent’s expected performance on machine. The results highlighted the value of the gating and Huber loss redesign, with hyperparameter optimization identifying TD3 training parameters as the most influential.
In triple-harmonic optimization, the same recurrent off-policy agent was trained with observations comprising the normalized profile, normalized radio‑frequency (RF) parameters (V1, V2, V3, φ2, φ3), and a normalized magnetic-field ramp rate. Reward shaping follows a stability‑then‑shape paradigm through bucket‑area sufficiency followed by a KL objective. TD3 training parameters were extrapolated from the 1D case. In simulation, the agent converged in under 20 corrections to effective phases and amplitudes, achieving approximate triple‑peak matching within tolerance and demonstrating that RL can solve the 5D optimization without supervised labels while learning safety‑centred strategies. Again, the agent showed robustness to different initialization conditions and impedances.
The project delivered embedded RF parameter control over the cycle, a data acquisition and preprocessing pipeline, operational scripts with monitoring, and publicly released training code with supporting documentation. Although optimised agents (1D and 5D) could not be validated on machine within the available time, verification results and safety mechanisms indicate readiness for testing and high likelihood of sufficiently high performance. Future work includes PSB validation of the optimised agents, further latency reduction, online PSB learning for the triple‑harmonic agent, and testing for transferability to other accelerators.