Circular Image

J.W. Böhmer

27 records found

Motivation: Clustering is an unsupervised learning task with broad applications. Traditional clustering methods often rely on point estimates of model parameters, which can limit their ability to capture uncertainty. Bayesian clustering addresses this by incorporating unce ...
Reinforcement Learning is a powerful tool for problems that require sequential-decision-making. However, it often faces challenges due to the extensive need for reward engineering. Reinforcement Learning from Human Feedback (RLHF) and Inverse Reinforcement Learning (IRL) hold the ...
Reinforcement Learning from Human Feedback (RLHF) offers a powerful approach to training agents in environments where defining an explicit reward function is challenging by learning from human feedback provided in various forms. This research evaluates three common feedback types ...
Reinforcement Learning from Human Feedback (RLHF) is a promising approach to training agents to perform complex tasks by incorporating human feedback. However, the quality and diversity of this feedback can significantly impact the learning process. Humans are highly diverse in t ...
The main concept behind reinforcement learning is that an agent takes certain actions and is rewarded or punished for these actions. However, the rewards that are involved when performing a certain task can be quite complicated in real life and the contribution of different facto ...
Proteins are fundamental biological macromolecules essential for cellular structure, enzymatic catalysis, and immune defense, making the generation of novel proteins crucial for advancements in medicine, biotechnology, and material sciences. This study explores protein design usi ...

Conflict in the World of Inverse Reinforcement Learning

Investigating Inverse Reinforcement Learning with Conflicting Demonstrations

Inverse Reinforcement Learning (IRL) algorithms are closely related to Reinforcement Learning (RL) but instead try to model the reward function from a given set of expert demonstrations. In IRL, many algorithms have been proposed, but most assume consistent demonstrations. Consis ...

Program Synthesis from Game Rewards Using FrAngel

Finding Complex Subprograms for Solving Minecraft

Program synthesis has been extensively used for automating code-related tasks, but it has yet to be applied in the realm of reward-based games. FrAngel is a component-based program synthesizer that addresses the aspects of exploration and exploitation, both important for the perf ...

Program Synthesis from Rewards with Probe

Adjusting Probe to Increase Exploration When Synthesising Programs from Rewards in Minecraft

Program synthesis is the task of generating a program that satisfies some specification. An important aspect of program synthesis is the method of specification. There are various ways in which a desired program can be specified, such as I/O examples, traces, and natural language ...
Program synthesis remains largely unexplored in the context of playing games, where exploration and exploitation are crucial for solving tasks within complex environments. FrAngel is a program synthesis algorithm that addresses both of these aspects with its fragments used for th ...

Program Synthesis from Rewards using Probe and FrAngel

Impact of Exploration-Exploitation Configurations on Probe and FrAngel in Minecraft

Program synthesis involves finding a program that meets the user intent, typically provided as input/output examples or formal mathematical specifications. This paper explores a novel specification in program synthesis - learning from rewards.
We explore existing synthesizer ...
Advancing protein design is crucial for breakthroughs in medicine and biotechnology, yet traditional approaches often fall short by focusing solely on representing protein sequences using the 20 canonical amino acids. This thesis explores discrete diffusion models for generating ...

Activity Progress Prediction

Is there progress in video progress prediction methods?

In this paper, we investigate the behaviour of current progress prediction methods on the currently used benchmark datasets. We show that the progress prediction methods can fail to extract useful information from visual data on these datasets. Moreover, when the methods fail to ...
Operation and maintenance of the built environment have a major effect on socioeconomic stability and sustainability. A significant part of our built world approaches or has well exceeded its designated structural life. As engineers, we need to find efficient ways to extend this ...
Experience replay for off-policy reinforcement learning has been shown to improve sample efficiency and stabilize training. However, typical uniformly sampled replay includes many irrelevant samples for the agent to reach good performance. We introduce Action Sensitive Experience ...
Neural networks are commonly initialized to keep the theoretical variance of the hidden pre-activations constant, in order to avoid the vanishing and exploding gradient problem. Though this condition is necessary to train very deep networks, numerous analyses showed that it is no ...
Autonomous robots have been widely applied to search and rescue missions for information gathering about target locations. This process needs to be continuously replanned based on new observations in the environment. For dynamic targets, the robot needs to not only discover them ...
Language is an intuitive and effective way for humans to communicate. Large Language Models (LLMs) can interpret and respond well to language. However, their use in deep reinforcement learning is limited as they are sample inefficient. State-of-the-art deep reinforcement learning ...
Effect Handler Oriented Programming is a promising new programming paradigm, delivering separation of of concerns with regards to side effects in an otherwise functional environment.
This paper discusses the applicability of this new paradigm to static code analysis programs. ...