Search Inspired Exploration for Reinforcement Learning

None, None

Search Inspired Exploration for Reinforcement Learning

Master Thesis (2025)

Author(s)

Georgios Sotirchos (TU Delft - Mechanical Engineering)

Contributor(s)

J. Kober – Mentor (TU Delft - Learning & Autonomous Control)

Zlatan Ajanovic – Mentor (RWTH Aachen University)

R. Babuska – Graduation committee member (TU Delft - Learning & Autonomous Control)

Faculty

Mechanical Engineering

Reinforcement Learning Exploration Search

To reference this document use:

https://resolver.tudelft.nl/uuid:a00f9d71-8a6d-4f0e-af52-0132758c08ac

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

14-11-2025

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Exploration in environments with sparse rewards remains a fundamental challenge for reinforcement learning (RL). Existing approaches such as curriculum learning and Go-Explore often rely on hand-crafted heuristics, while curiosity-driven methods risk converging to suboptimal policies. We propose Search-Inspired Exploration in Reinforcement Learning (SIERL), a novel method that actively guides exploration by setting sub-goals based on the agent's learning progress. At the beginning of each episode, SIERL chooses a sub-goal from the frontier (the boundary of the agent’s known state space) before the agent continues exploring toward the main task objective. The key contribution of our method is the sub-goal selection mechanism, which provides state-action pairs that are neither overly familiar nor completely novel. It assures that the frontier is expanded systematically and that the agent is capable of reaching any state within it. Inspired by search, sub-goals are prioritized from the frontier based on estimates of cost-to-come and cost-to-go, effectively steering exploration towards the most informative regions. In experiments on challenging sparse-reward environments, SIERL outperforms dominant baselines in both achieving the main task goal and generalizing to reach arbitrary states in the environment.

Files

MSc_thesis_G_Sotirchos.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 31-01-2026