Leveraging LLMs for Classifying Subjective Topics Behind Public Discourse

None, None

Leveraging LLMs for Classifying Subjective Topics Behind Public Discourse

Bachelor Thesis (2024)

Author(s)

A. Marcu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Luciano Cavalcante Siebert – Mentor (TU Delft - Interactive Intelligence)

A. Homayounirad – Mentor (TU Delft - Interactive Intelligence)

Enrico Liscio – Mentor (TU Delft - Interactive Intelligence)

J. Yang – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Multi-label classification Large Language Models (LLMs) Fine-Tuning Prompt engineering Public Deliberation Subjective topics

To reference this document use:

https://resolver.tudelft.nl/uuid:70b36d6b-054e-45b0-b4ad-7e424c5e67ab

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

27-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Public deliberations play a crucial role in democratic systems. However, the unstructured nature of deliberations leads to challenges for moderators to analyze the large volume of data produced. This paper aims to solve this challenge by automatically identifying subjective topics behind public discourse by leveraging Large Language Models (LLMs). The study is structured around two core objectives: Identifying Gold Labels and Exploring Subjective Human Labels. The results highlight that fine-tuning the LLaMa-2 model with QLoRa outperforms other methods for Identifying Gold Labels, while the Few-Shot Chain of Thoughts method, enhanced with EmotionPrompt, is particularly effective in capturing subjective variations in human annotations. However, the study also underscores significant limitations, such as the dependency on large, high-quality annotated datasets and the tendency of models to produce hallucinations. These findings highlight the potential of LLMs to identify subjective topics behind public discourse, while also emphasizing the need for further research to address these challenges.

Files

CSE3000_Research_Project_Paper... (pdf)

(pdf | 3.08 Mb)

License info not available