V. Bunovska

info

Please Note

<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>

Bachelor thesis (1)

1 records found

The Alignment of Large Language Models' Responses to Subjective Variations in Hate Speech

Comparing Alignment to Real-Life-Inspired Definitions in Zero-Shot Hate Speech Classification

Bachelor thesis (2026) - V. Bunovska, P.K. Murukannaiah, U. Khurana, C.C.S. Liem

Detecting hateful content on social media has become an active area of research, with recent approaches focusing on the use of Large Language Models (LLMs). Rather than using datasets to train classifiers, researchers are exploring methods that embed hate speech definitions directly in the model's prompt. However, hate speech is a subjective concept, and its definition varies across contexts. As a result, LLMs must align their classifications with the specific definition provided in the prompt. To make the creation process more systematic, frameworks for constructing context-specific definitions of hate speech have been proposed. Yet, no work has compared how framework-based formulations influence LLM alignment relative to the definitions used in real-life regulation, such as laws and social media policies. This study, therefore, compares definitions from the Hate Speech Criteria (HSC) framework, legal texts, and platform policies by evaluating how precisely two LLMs align with each type under a zero-shot prompting setup. Our results indicate that while the level of alignment is model-dependent, legal and policy definitions generally guide LLM behavior more effectively than framework-based formulations. Nevertheless, definitions created with the framework still steer models in the intended direction, suggesting that further refinement of these frameworks could improve their effectiveness in prompt-based hate speech detection. ...