NLP and reinforcement learning to generate morally aligned text

None, None

NLP and reinforcement learning to generate morally aligned text

How does explainable models perform compared to black-box models

Bachelor Thesis (2023)

Author(s)

N. De Leeuw (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Enrico Liscio – Mentor (TU Delft - Interactive Intelligence)

D. Mambelli – Mentor (TU Delft - Interactive Intelligence)

P.K. Murukannaiah – Mentor (TU Delft - Interactive Intelligence)

Jie Yang – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

NLP RL Ethics

To reference this document use:

https://resolver.tudelft.nl/uuid:60166a01-cdbe-40af-8cb7-4e5595d108d3

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

03-07-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper evaluates the performance of an automated explainable model, Moral- Strength, to predict morality, or more pre- cisely Moral Foundations Theory (MFT) traits. MFT is a way to represent and divide morality into precise and detailed traits. This evaluation happens in the Jiminy Cricket environment, an environ- ment composed of 25 text-based games. This evaluation helps us estimate the do- main adaptation of MoralStrength, and also its limitations. The explainability of this model helps understand those limitations. We can conclude that MoralStrength is per- forming overall worse than other optimal models and that the domain adaptation to the Jiminy Cricket domain has some cru- cial flaws, but it leads us to think about the explainability/accuracy trade-off and where to draw the line, knowing that explainable models are important for ethical decision- making.

Files

NathanielDeLeeuwPaper.pdf

(pdf | 0.584 Mb)

License info not available