What if fanfiction, but also coding: Investigating cultural differences in fanfiction writing and reviewing with machine learning methods

None, None

What if fanfiction, but also coding: Investigating cultural differences in fanfiction writing and reviewing with machine learning methods

Fine Tuning a BERT-based Pre-Trained Language Model for Named Entity Extraction within the Domain of Fanfiction

Bachelor Thesis (2025)

Author(s)

N.P.A. Kindt (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

H.S. Hung – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

C. Hao – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

I. Kondyurin – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

E. Eisemann – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Natural Language Processing (NLP) Domain Adaptation BERT Fine-Tuning Named Entity Recognition Language Models Natural Language Understanding Fanfiction

To reference this document use:

https://resolver.tudelft.nl/uuid:ab82a159-85e2-4e12-a757-54eb4e08d5a2

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

07-02-2025

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The introduction of Pretrained Language Models (PLMs) has revolutionised the field of Natural Language Processing (NLP) and paved the way for many new, exciting large-scale studies for various areas of research. One such field presents itself in the emerging digital literary corpus that is fanfiction, providing research opportunities within the fields of (NLP), Computational (Socio-) Linguistics, the Social Sciences and Digital Humanities. However, because of the unique linguistic characteristics of this literary domain many modern NLP solutions utilizing PLMs encounter difficulties when applied on fanfiction texts. This paper aims to indicate that the performance of various NLP tasks performed by PLMs on fanfiction texts can be improved by applying Domain Adaptive Pre-Training (DAPT) to PLMs. A case-study is performed to show that the performance of a BERT-based PLM can be improved for the downstream NLP task of Named Entity Recognition (NER) by applying supervised domain specific fine-tuning. While we gain a 6% increase in F1 score performance, we are sceptical about these results due to the limited amount of annotated data available leading to the model overfitting and show a lack of capacity to generalize to unseen data from the CoNLL NER dataset.

Files

CSE3000_Final_Paper_-_Nathan_K... (pdf)

(pdf | 1.39 Mb)

License info not available