NLP and reinforcement learning to generate morally aligned text

How does explainable models perform compared to black-box models

Bachelor Thesis (2023)
Author(s)

N. De Leeuw (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Enrico Liscio – Mentor (TU Delft - Interactive Intelligence)

D. Mambelli – Mentor (TU Delft - Interactive Intelligence)

P.K. Murukannaiah – Mentor (TU Delft - Interactive Intelligence)

Jie Yang – Graduation committee member (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Nathaniël De Leeuw
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Nathaniël De Leeuw
Graduation Date
03-07-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract


This paper evaluates the performance of an automated explainable model, Moral- Strength, to predict morality, or more pre- cisely Moral Foundations Theory (MFT) traits. MFT is a way to represent and divide morality into precise and detailed traits. This evaluation happens in the Jiminy Cricket environment, an environ- ment composed of 25 text-based games. This evaluation helps us estimate the do- main adaptation of MoralStrength, and also its limitations. The explainability of this model helps understand those limitations. We can conclude that MoralStrength is per- forming overall worse than other optimal models and that the domain adaptation to the Jiminy Cricket domain has some cru- cial flaws, but it leads us to think about the explainability/accuracy trade-off and where to draw the line, knowing that explainable models are important for ethical decision- making.

Files

NathanielDeLeeuwPaper.pdf
(pdf | 0.584 Mb)
License info not available