This thesis investigates the minute-level operation of a 40 MWh battery in the Continuous Intraday Market. The study focuses on the Dutch market within its European cross-border setting and formulates dispatch as a finite-horizon Markov Decision Process that captures both battery
...
This thesis investigates the minute-level operation of a 40 MWh battery in the Continuous Intraday Market. The study focuses on the Dutch market within its European cross-border setting and formulates dispatch as a finite-horizon Markov Decision Process that captures both battery physics and order-book depth. A continuous action-to-power mapping guarantees feasibility by enforcing state-ofcharge, efficiency, and market liquidity constraints. Two deep reinforcement learning methods—TD3 and TD3 with behaviour cloning (TD3+BC)—are implemented and benchmarked against a rollingintrinsic (RI) optimiser, which serves as both baseline and source of expert trajectories. Minute-level resolution proves empirically justified: the RI benchmark at one-minute granularity consistently outperforms its fifteen-minute counterpart. In comparative experiments, TD3+BC outperforms plain TD3 and narrows the gap to RI, reaching within about 4% of its profit on average, though not consistently surpassing it. The learned policy exhibits a distinct cycle-efficient trading style with fewer equivalent full cycles and lower throughput than RI but higher value extracted per cycle, which translates over the project lifetime into a stronger business case, yielding an internal rate of return approximately twice that of the RI baseline. Training TD3+BC for five million steps is computationally tractable on a standard laptop (≈11 hours), and inference runs in milliseconds per step, confirming real-time deployability. The overall framework thus demonstrates that deep reinforcement learning can scale to realistic battery sizes and minutelevel trading, yielding stable and interpretable policies. At the same time, the persistent strength of the RI optimiser highlights the value of structural priors, suggesting that hybrid approaches combining reinforcement learning with optimisation principles may offer the most promising path toward robust, market-ready battery trading systems.