EfficientTDMPC

None, None

EfficientTDMPC

Improved MPC Objectives for Sample-Efficient Continuous Control

Master Thesis (2026)

Author(s)

T. Evers (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.H.G. Dauwels – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Robotics Reinforcement learning Machine Lear

To reference this document use

https://resolver.tudelft.nl/uuid:09eba1be-1f17-44a3-ad37-7dbc7e958700

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

15-06-2026

Awarding Institution

Delft University of Technology

Programme

Electrical Engineering, Signals and Systems

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

5

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Model-based reinforcement learning can improve sample efficiency by learning a model of the environment and using it for planning. In continuous-control tasks, this allows an agent to evaluate many candidate action sequences before acting. However, planning with a learned model also introduces a failure mode: the planner optimizes a learned return estimate rather than the true environment return, and can therefore exploit errors in the dynamics model, reward model, or value function.
This thesis studies how the model predictive control objective used in TD-MPC-style agents can be made more reliable. The main contribution is EfficientTDMPC, a method that modifies the planner objective by aggregating return estimates across multiple dynamics heads and rollout depths, and by applying disagreement-based pessimism during reanalyze. These changes aim to reduce the variance and exploitability of model-based return estimates while preserving the sample efficiency benefits of latent model-based planning.
EfficientTDMPC is the new state-of-the-art on HumanoidBench-Hard and the hard DeepMind Control Suite (DMC) while matching the state-of-the-art on Easy DMC. The thesis also discusses adaptive horizon selection as a future direction, arguing that planning depth should be treated as part of the uncertainty-aware planning objective. Overall, the thesis shows that the reliability of the learned planning objective is a central design problem in sample-efficient model-based reinforcement learning.

Files

Thesis_15_June.pdf

(pdf | 5.14 Mb)

License info not available