Program Synthesis from Rewards with Probe

None, None

Program Synthesis from Rewards with Probe

Adjusting Probe to Increase Exploration When Synthesising Programs from Rewards in Minecraft

Bachelor Thesis (2024)

Author(s)

N.M. Mikk (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Sebastijan Dumančić – Mentor (TU Delft - Algorithmics)

T.R. Hinnerichs – Mentor (TU Delft - Algorithmics)

Wendelin Böhmer – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty

Electrical Engineering, Mathematics and Computer Science

Program synthesis Minecraft MineRL

To reference this document use:

https://resolver.tudelft.nl/uuid:1f882b7a-c81a-4ac4-ad4f-2ef9410da62a

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

21-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Program synthesis is the task of generating a program that satisfies some specification. An important aspect of program synthesis is the method of specification. There are various ways in which a desired program can be specified, such as I/O examples, traces, and natural language. This research paper aims to explore a novel method of specifying a desired program in program synthesis -- rewards. This concept is explored by adjusting the Probe program synthesiser to solve the dense navigation environments in MineRL. In order to avoid local maxima, it is necessary to increase the amount of exploration. To that end, different ways of increasing exploration were tested by changing the parameters of Probe. By increasing the amount of exploration, it is possible to solve more environments, or solve them faster. But increasing exploration could also have the opposite effect, depending on the environment.

Files

Research_paper.pdf

(pdf | 0.18 Mb)

License info not available