What would Jiminy Cricket do?

A pluralist approach in generating and processing morally-aligned text

Bachelor thesis (2023)

Authors

K.N.I. Timmerman Electrical Engineering, Mathematics and Computer Science

Contributors

E. Liscio Interactive Intelligence - (mentor)

D. Mambelli Interactive Intelligence - (mentor)

P.K. Murukannaiah Interactive Intelligence - (graduation committee member)

J. Yang Web Information Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:1228b210-e42a-45b2-abd5-8ae9b27d2fb0

More Info

expand_more

Published Date

04-07-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

When making decisions, people are automatically guided by their moral compass. However, AI agents need to be conditioned in order to be steered towards moral behaviour. An environment that can be used to train and test agents is the Jiminy Cricket environment. The Jiminy Cricket environment consists of a set of text-based narrative games, where every action possible is annotated with the morality of that action. However, to create a more morally nuanced agent, we have annotated all of the actions according to the following moral values: Care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and purity/degradation. To morally condition the agent, we calculate the predicted progress of a potential action and combine it with an oracle to retrieve the moral annotation of the potential action. Using both of these components, the score per generated action is calculated and based on the score the eventual action is chosen. The score can be calculated differently based on the weights assigned to the overall progress and morality, as well as based on the sub-weights assigned to each moral value. Using this environment we pose the question, if we focus on only one moral value, what is the most optimal configuration that can be achieved in order to maximize both progress and morality? From the results we can observe that the lowest relative immorality can be achieved by imposing no moral constraints on the agent. Posing constraints on the agent will lead to a relatively bigger decrease of the completion percentage than to the immorality decrease. One-hot encoding the moral values will reveal which immoral actions are needed to progress in the game, and which immoral actions should to be prevented to lower immorality.

Files

Final_Thesis_Paper.pdf

(.pdf | 0.355 Mb)