Tailoring In-Context Learning Techniques for Definition-Based Hate Speech Detection in Large Language Models

None, None

Tailoring In-Context Learning Techniques for Definition-Based Hate Speech Detection in Large Language Models

Bachelor Thesis (2026)

Author(s)

Parham Bateni (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Pradeep Kumar Murukannaiah – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Urja Khurana – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Cynthia Liem – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Large Language Models (LLMs) Zero-Shot Learning Few-Shot Learning Prompt Engineering Hate Speech Detection In-Context Learning

To reference this document use

https://resolver.tudelft.nl/uuid:cbf678c1-c5a5-4810-9f9c-4419a1c4bcaf

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

23-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project, The Alignment of Large Language Models' Responses to Subjective Variations in Hate Speech

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

6

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Hate speech lacks a single agreed definition across legal, social, and benchmark contexts, yet instruction-tuned large language models (LLMs) are increasingly used for hate speech detection. While recent work has explored definition-aware prompting, it remains unclear how different definitions interact with few-shot prompting strategies and model capacity. We investigate whether zero-shot and few-shot in-context learning can align LLMs with dataset-specific hate speech definitions without fine-tuning. Using the HateCheck benchmark, we evaluate three models (Gemma-2-2B, Llama-3.2-3B, and Qwen2.5-3B) under three definition settings (no definition, author-provided text, and structured criteria-based definition) and four prompting strategies (zero-shot and three few-shot variants). Results show that explicit definitions do not reliably improve performance and can sometimes reduce it. Furthermore, few-shot prompting is generally more effective, with the strongest performance often achieved by retrieving semantically similar examples for each query and including them in the prompt. In addition, higher-capacity models benefit more from richer prompts, whereas the smallest model frequently degrades as prompt complexity increases. Overall, definition wording, exemplar selection, and model capacity interact strongly and should be tuned jointly rather than considered in isolation.

Files

Research_Paper_Final.pdf

(pdf | 0.739 Mb)

License info not available