Evaluating interpretability of state-of-the-art NLP models for predicting moral values

None, None

Evaluating interpretability of state-of-the-art NLP models for predicting moral values

Bachelor Thesis (2021)

Author(s)

I.L. Constantinescu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

E. Liscio – Mentor (TU Delft - Interactive Intelligence)

P.K. Murukannaiah – Mentor (TU Delft - Interactive Intelligence)

R. Guerra Marroquim – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Natural Language Processing Moral foundations Moral values Explainable AI

To reference this document use:

https://resolver.tudelft.nl/uuid:f8560b2b-8831-4c79-923a-9de785aa3c85

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

02-07-2021

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Understanding personal values is a crucial aspect that can facilitate the collaboration between AI and humans. Nonetheless, the implementation of collaborative agents in real life greatly depends on the amount of trust that is built in their relationship with people. In order to bridge this gap, more extensive analysis of the explainability of these systems needs to be conducted. We implement LSTM, BERT and FastText, three deep learning models for text classification and compare their interpretability on the task of predicting moral values from opinionated text. The results highlight the different degrees to which the behaviour of the three models can be explained in the context of moral value prediction. Our experiments showed that BERT, current state-of-the-art in natural language processing tasks, achieves the best performance while also providing more interpretable predictions than the other two models.

Files

Research_Paper_Ionut_Constanti... (pdf)

(pdf | 1.11 Mb)

- Embargo expired in 31-12-2022

License info not available