Natural Language Processing and Reinforcement Learning to Generate Morally Aligned Text

Comparing a moral agent to an optimally playing agent

Bachelor Thesis (2023)
Author(s)

R.A.X.M. Lubbers (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Enrico Liscio – Mentor (TU Delft - Interactive Intelligence)

D. Mambelli – Mentor (TU Delft - Interactive Intelligence)

P.K. Murukannaiah – Mentor (TU Delft - Interactive Intelligence)

J. Yang – Mentor (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Rob Lubbers
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Rob Lubbers
Graduation Date
03-07-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Nowadays Large Language Models are becoming more and more prevalent in today's society. These models act without a sense of morality however. They only prioritize accomplishing their goal. Currently, little research has been done evaluating these models. The current state of the art Reinforcement Learning models represent morality by a singular scalar value determining the morality of a statement. This way of representing morality is inaccurate as there are multiple features determining how moral a statement is. We leverage knowledge from the Moral Foundations Theory to represent morality in a more accurate way, by using a 5-dimensional vector representing morality features. We implement several different agents in an environment where decisions with possible moral implications need to be made. These agents all use alternative approaches in deciding which action to take. The policies are: always pick the most moral action and always pick the most immoral action. Two other agents have the same aforementioned policy but still give some weight towards game progression. Lastly, we look at an amoral agent which does not look at morality at all. We compare these agents by percent completion of the Infocom game suspect. We find that the agent which does not take morality into account achieves the highest completion rate. Agents which give morality a huge weight almost instantly get stuck in an infinite loop without progression.

Files

CSE3000_Final_Paper.pdf
(pdf | 0.275 Mb)
License info not available