Print Email Facebook Twitter NLP and reinforcement learning to generate morally aligned text Title NLP and reinforcement learning to generate morally aligned text: How does explainable models perform compared to black-box models Author De Leeuw, Nathaniël (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Liscio, E. (mentor) Mambelli, D. (mentor) Murukannaiah, P.K. (mentor) Yang, J. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2023-07-03 Abstract This paper evaluates the performance of an automated explainable model, Moral- Strength, to predict morality, or more pre- cisely Moral Foundations Theory (MFT) traits. MFT is a way to represent and divide morality into precise and detailed traits. This evaluation happens in the Jiminy Cricket environment, an environ- ment composed of 25 text-based games. This evaluation helps us estimate the do- main adaptation of MoralStrength, and also its limitations. The explainability of this model helps understand those limitations. We can conclude that MoralStrength is per- forming overall worse than other optimal models and that the domain adaptation to the Jiminy Cricket domain has some cru- cial flaws, but it leads us to think about the explainability/accuracy trade-off and where to draw the line, knowing that explainable models are important for ethical decision- making. Subject NLPRLEthics To reference this document use: http://resolver.tudelft.nl/uuid:60166a01-cdbe-40af-8cb7-4e5595d108d3 Part of collection Student theses Document type bachelor thesis Rights © 2023 Nathaniël De Leeuw Files PDF NathanielDeLeeuwPaper.pdf 598.35 KB Close viewer /islandora/object/uuid:60166a01-cdbe-40af-8cb7-4e5595d108d3/datastream/OBJ/view