Leveraging Large Language Models for Classifying Subjective Arguments in Public Discourse

None, None

Leveraging Large Language Models for Classifying Subjective Arguments in Public Discourse

Bachelor Thesis (2024)

Author(s)

A. Dobrinoiu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Luciano Cavalcante Siebert – Mentor (TU Delft - Interactive Intelligence)

A. Homayounirad – Mentor (TU Delft - Interactive Intelligence)

Enrico Liscio – Mentor (TU Delft - Interactive Intelligence)

J. Yang – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

NLP AI LLM Argument mining

To reference this document use:

https://resolver.tudelft.nl/uuid:4e6354e7-69c3-45d0-adda-e10fea9b9d54

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

27-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This study investigates the effectiveness of Large Language Models (LLMs) in identifying and classifying subjective arguments within deliberative discourse. Using data from a Participatory Value Evaluation (PVE) conducted in the Netherlands, this research introduces an annotation strategy for identifying arguments and extracting their premises. Then, the Llama 2 model is used to test three different prompting approaches: zero-shot, one-shot and few-shot. The performance is evaluated using the cosine similarity metric and later enhanced by introducing chain-of-thought prompting. The results show that zero-shot prompting unexpectedly outperforms one-shot and few-shot prompting, due to the LLM overfitting to the examples provided. Chain-of-thought prompting is shown to improve the argument identification task. The subjectivity of the annotation task is reflected by the low averaged pairwise F1 score between annotators, and the considerable variance in the number of data items marked by each annotator as not being arguments. The subjectivity of the task is further highlighted by a pairwise chain-of-thought prompting analysis, which shows that annotators with more similar annotations received more similar LLM responses.

Files

Research_paper_final_Adina_Dob... (pdf)

(pdf | 0.284 Mb)

License info not available