Exploring Stance Detection of Opinion Texts: Evaluating the Performance of a Large Language Model
Benchmarking the Performance of Stance Classification by GPT-3-Turbo
N. Mateijsen (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Morita Tarvirdians – Mentor (TU Delft - Interactive Intelligence)
C.M. Jonker – Mentor (TU Delft - Interactive Intelligence)
M.L. Molenaar – Graduation committee member (TU Delft - Computer Graphics and Visualisation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
In April 2020, a Dutch research team swiftly analyzed public opinions on COVID-19 lockdown relaxations. However, due to time constraints, only a small amount of opinion data could be processed. With the surge of popularity in the field of Natural Language Processing (NLP) and the arrival of tools like ChatGPT, a number of tasks involving Large Language Models (LLMs) have become easier. This study aims to address the effectiveness of these LLMs on stance detection using this COVID-19 opinion corpus. The corpus is chunked and sampled to be used as input for OpenAI's GPT-3.5-Turbo LLM. The machine-generated stances are then evaluated against multiple binary classification metrics. It is shown that these models perform very well in the field of stance detection, with an average F-score of 0.895. However, a significant number of misclassifications are observed in one dataset. Therefore we conclude that while LLMs offer valuable guidelines, it is still crucial to verify their outputs when dealing with complex or important public matters.