What would Jiminy Cricket do?

None, None

What would Jiminy Cricket do?

A pluralist approach in generating and processing morally-aligned text

Bachelor Thesis (2023)

Author(s)

K.N.I. Timmerman (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

E. Liscio – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

D. Mambelli – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

P.K. Murukannaiah – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J. Yang – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Transformers Natural Language Processing Moral Foundation Theory Reinforcement Learning Artificial intelligence Reward bias

To reference this document use

https://resolver.tudelft.nl/uuid:1228b210-e42a-45b2-abd5-8ae9b27d2fb0

More Info

expand_more

Publication Year

2023

Language

English

Graduation Date

04-07-2023

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

452

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

When making decisions, people are automatically guided by their moral compass. However, AI agents need to be conditioned in order to be steered towards moral behaviour. An environment that can be used to train and test agents is the Jiminy Cricket environment. The Jiminy Cricket environment consists of a set of text-based narrative games, where every action possible is annotated with the morality of that action. However, to create a more morally nuanced agent, we have annotated all of the actions according to the following moral values: Care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and purity/degradation. To morally condition the agent, we calculate the predicted progress of a potential action and combine it with an oracle to retrieve the moral annotation of the potential action. Using both of these components, the score per generated action is calculated and based on the score the eventual action is chosen. The score can be calculated differently based on the weights assigned to the overall progress and morality, as well as based on the sub-weights assigned to each moral value. Using this environment we pose the question, if we focus on only one moral value, what is the most optimal configuration that can be achieved in order to maximize both progress and morality? From the results we can observe that the lowest relative immorality can be achieved by imposing no moral constraints on the agent. Posing constraints on the agent will lead to a relatively bigger decrease of the completion percentage than to the immorality decrease. One-hot encoding the moral values will reveal which immoral actions are needed to progress in the game, and which immoral actions should to be prevented to lower immorality.

Files

Final_Thesis_Paper.pdf

(pdf | 0.355 Mb)

License info not available