Program Synthesis from Rewards with Probe
Adjusting Probe to Increase Exploration When Synthesising Programs from Rewards in Minecraft
N.M. Mikk (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Sebastijan Dumančić – Mentor (TU Delft - Algorithmics)
T.R. Hinnerichs – Mentor (TU Delft - Algorithmics)
Wendelin Böhmer – Graduation committee member (TU Delft - Sequential Decision Making)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Program synthesis is the task of generating a program that satisfies some specification. An important aspect of program synthesis is the method of specification. There are various ways in which a desired program can be specified, such as I/O examples, traces, and natural language. This research paper aims to explore a novel method of specifying a desired program in program synthesis -- rewards. This concept is explored by adjusting the Probe program synthesiser to solve the dense navigation environments in MineRL. In order to avoid local maxima, it is necessary to increase the amount of exploration. To that end, different ways of increasing exploration were tested by changing the parameters of Probe. By increasing the amount of exploration, it is possible to solve more environments, or solve them faster. But increasing exploration could also have the opposite effect, depending on the environment.