What would Jiminy Cricket do?

A pluralist approach in generating and processing morally-aligned text

More Info
expand_more

Abstract

When making decisions, people are automatically guided by their moral compass. However, AI agents need to be conditioned in order to be steered towards moral behaviour. An environment that can be used to train and test agents is the Jiminy Cricket environment. The Jiminy Cricket environment consists of a set of text-based narrative games, where every action possible is annotated with the morality of that action. However, to create a more morally nuanced agent, we have annotated all of the actions according to the following moral values: Care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and purity/degradation. To morally condition the agent, we calculate the predicted progress of a potential action and combine it with an oracle to retrieve the moral annotation of the potential action. Using both of these components, the score per generated action is calculated and based on the score the eventual action is chosen. The score can be calculated differently based on the weights assigned to the overall progress and morality, as well as based on the sub-weights assigned to each moral value. Using this environment we pose the question, if we focus on only one moral value, what is the most optimal configuration that can be achieved in order to maximize both progress and morality? From the results we can observe that the lowest relative immorality can be achieved by imposing no moral constraints on the agent. Posing constraints on the agent will lead to a relatively bigger decrease of the completion percentage than to the immorality decrease. One-hot encoding the moral values will reveal which immoral actions are needed to progress in the game, and which immoral actions should to be prevented to lower immorality.