Program Synthesis from Rewards with Probe

Adjusting Probe to Increase Exploration When Synthesising Programs from Rewards in Minecraft

More Info
expand_more

Abstract

Program synthesis is the task of generating a program that satisfies some specification. An important aspect of program synthesis is the method of specification. There are various ways in which a desired program can be specified, such as I/O examples, traces, and natural language. This research paper aims to explore a novel method of specifying a desired program in program synthesis -- rewards. This concept is explored by adjusting the Probe program synthesiser to solve the dense navigation environments in MineRL. In order to avoid local maxima, it is necessary to increase the amount of exploration. To that end, different ways of increasing exploration were tested by changing the parameters of Probe. By increasing the amount of exploration, it is possible to solve more environments, or solve them faster. But increasing exploration could also have the opposite effect, depending on the environment.

Files