Tailoring In-Context Learning Techniques for Definition-Based Hate Speech Detection in Large Language Models

Bachelor Thesis (2026)
Author(s)

Parham Bateni (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Pradeep Kumar Murukannaiah – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Urja Khurana – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Cynthia Liem – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
23-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project, The Alignment of Large Language Models' Responses to Subjective Variations in Hate Speech
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
6
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Hate speech lacks a single agreed definition across legal, social, and benchmark contexts, yet instruction-tuned large language models (LLMs) are increasingly used for hate speech detection. While recent work has explored definition-aware prompting, it remains unclear how different definitions interact with few-shot prompting strategies and model capacity. We investigate whether zero-shot and few-shot in-context learning can align LLMs with dataset-specific hate speech definitions without fine-tuning. Using the HateCheck benchmark, we evaluate three models (Gemma-2-2B, Llama-3.2-3B, and Qwen2.5-3B) under three definition settings (no definition, author-provided text, and structured criteria-based definition) and four prompting strategies (zero-shot and three few-shot variants). Results show that explicit definitions do not reliably improve performance and can sometimes reduce it. Furthermore, few-shot prompting is generally more effective, with the strongest performance often achieved by retrieving semantically similar examples for each query and including them in the prompt. In addition, higher-capacity models benefit more from richer prompts, whereas the smallest model frequently degrades as prompt complexity increases. Overall, definition wording, exemplar selection, and model capacity interact strongly and should be tuned jointly rather than considered in isolation.

Files

Research_Paper_Final.pdf
(pdf | 0.739 Mb)
License info not available