Exploring Stance Detection of Opinion Texts: Evaluating the Performance of a Large Language Model

None, None

Exploring Stance Detection of Opinion Texts: Evaluating the Performance of a Large Language Model

Benchmarking the Performance of Stance Classification by GPT-3-Turbo

Bachelor Thesis (2023)

Author(s)

N. Mateijsen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Morita Tarvirdians – Mentor (TU Delft - Interactive Intelligence)

C.M. Jonker – Mentor (TU Delft - Interactive Intelligence)

M.L. Molenaar – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:7cc7ae50-9cd6-4055-8314-8f98a6aea081

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

03-07-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In April 2020, a Dutch research team swiftly analyzed public opinions on COVID-19 lockdown relaxations. However, due to time constraints, only a small amount of opinion data could be processed. With the surge of popularity in the field of Natural Language Processing (NLP) and the arrival of tools like ChatGPT, a number of tasks involving Large Language Models (LLMs) have become easier. This study aims to address the effectiveness of these LLMs on stance detection using this COVID-19 opinion corpus. The corpus is chunked and sampled to be used as input for OpenAI's GPT-3.5-Turbo LLM. The machine-generated stances are then evaluated against multiple binary classification metrics. It is shown that these models perform very well in the field of stance detection, with an average F-score of 0.895. However, a significant number of misclassifications are observed in one dataset. Therefore we conclude that while LLMs offer valuable guidelines, it is still crucial to verify their outputs when dealing with complex or important public matters.

Files

CSE3000_Final_Paper.pdf

(pdf | 0.131 Mb)

License info not available