NLP and reinforcement learning to generate morally aligned text

How does explainable models perform compared to black-box models

Bachelor thesis (2023)

Authors

N. De Leeuw Electrical Engineering, Mathematics and Computer Science

Contributors

E. Liscio Interactive Intelligence - (mentor)

D. Mambelli Interactive Intelligence - (mentor)

P.K. Murukannaiah Interactive Intelligence - (mentor)

J. Yang Web Information Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

NLP RL Ethics

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:60166a01-cdbe-40af-8cb7-4e5595d108d3

Published Date

03-07-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

This paper evaluates the performance of an automated explainable model, Moral- Strength, to predict morality, or more pre- cisely Moral Foundations Theory (MFT) traits. MFT is a way to represent and divide morality into precise and detailed traits. This evaluation happens in the Jiminy Cricket environment, an environ- ment composed of 25 text-based games. This evaluation helps us estimate the do- main adaptation of MoralStrength, and also its limitations. The explainability of this model helps understand those limitations. We can conclude that MoralStrength is per- forming overall worse than other optimal models and that the domain adaptation to the Jiminy Cricket domain has some cru- cial flaws, but it leads us to think about the explainability/accuracy trade-off and where to draw the line, knowing that explainable models are important for ethical decision- making.

Files

NathanielDeLeeuwPaper.pdf

(.pdf | 0.584 Mb)