Natural Language Processing and Reinforcement Learning to Generate Morally Aligned Text

None, None

Natural Language Processing and Reinforcement Learning to Generate Morally Aligned Text

Comparing a moral agent to an optimally playing agent

Bachelor Thesis (2023)

Author(s)

R.A.X.M. Lubbers (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Enrico Liscio – Mentor (TU Delft - Interactive Intelligence)

D. Mambelli – Mentor (TU Delft - Interactive Intelligence)

P.K. Murukannaiah – Mentor (TU Delft - Interactive Intelligence)

J. Yang – Mentor (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Natural Language Processing Reinforcement Learning Morality Ethics

To reference this document use:

https://resolver.tudelft.nl/uuid:38b1be58-89dc-4d97-b003-6185b111a06d

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

03-07-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Nowadays Large Language Models are becoming more and more prevalent in today's society. These models act without a sense of morality however. They only prioritize accomplishing their goal. Currently, little research has been done evaluating these models. The current state of the art Reinforcement Learning models represent morality by a singular scalar value determining the morality of a statement. This way of representing morality is inaccurate as there are multiple features determining how moral a statement is. We leverage knowledge from the Moral Foundations Theory to represent morality in a more accurate way, by using a 5-dimensional vector representing morality features. We implement several different agents in an environment where decisions with possible moral implications need to be made. These agents all use alternative approaches in deciding which action to take. The policies are: always pick the most moral action and always pick the most immoral action. Two other agents have the same aforementioned policy but still give some weight towards game progression. Lastly, we look at an amoral agent which does not look at morality at all. We compare these agents by percent completion of the Infocom game suspect. We find that the agent which does not take morality into account achieves the highest completion rate. Agents which give morality a huge weight almost instantly get stuck in an infinite loop without progression.

Files

CSE3000_Final_Paper.pdf

(pdf | 0.275 Mb)

License info not available