The Human Factor: Addressing Diversity in Reinforcement Learning from Human Feedback

None, None

The Human Factor: Addressing Diversity in Reinforcement Learning from Human Feedback

How can RLHF deal with possibly conflicting feedback?

Bachelor Thesis (2024)

Author(s)

J. PAEZ FRANCO (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Mone – Mentor (TU Delft - Interactive Intelligence)

Luciano C. Cavalcante Siebert – Mentor (TU Delft - Interactive Intelligence)

Wendelin Böhmer – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Artifical Intelligence Diversity Reinforcement Learning from Human Feedback

To reference this document use:

https://resolver.tudelft.nl/uuid:a7b37b44-4798-492e-822e-f1b7c347410b

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

27-06-2024

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement Learning from Human Feedback (RLHF) is a promising approach to training agents to perform complex tasks by incorporating human feedback. However, the quality and diversity of this feedback can significantly impact the learning process. Humans are highly diverse in their preferences, expertise, and capabilities. This paper investigates the effects of conflicting feedback on the agent’s performance. We analyse the impact of environmental complexity and examine various query selection strategies. Our results show that RLHF performance rapidly degrades with even minimal conflicting feedback in simple environments, and current query selection strategies are ineffective in handling feedback diversity. We thus conclude that addressing diversity is crucial for RLHF, suggesting alternative reward modelling approaches are needed. Full code is available on GitHub.

Files

Research_Project_Paper.pdf

(pdf | 0.996 Mb)

License info not available