What if fanfiction, but also coding: Investigating cultural differences in fanfiction writing and reviewing with machine learning methods

Fine Tuning a BERT-based Pre-Trained Language Model for Named Entity Extraction within the Domain of Fanfiction

More Info
expand_more

Abstract

The introduction of Pretrained Language Models (PLMs) has revolutionised the field of Natural Language Processing (NLP) and paved the way for many new, exciting large-scale studies for various areas of research. One such field presents itself in the emerging digital literary corpus that is fanfiction, providing research opportunities within the fields of (NLP), Computational (Socio-) Linguistics, the Social Sciences and Digital Humanities. However, because of the unique linguistic characteristics of this literary domain many modern NLP solutions utilizing PLMs encounter difficulties when applied on fanfiction texts. This paper aims to indicate that the performance of various NLP tasks performed by PLMs on fanfiction texts can be improved by applying Domain Adaptive Pre-Training (DAPT) to PLMs. A case-study is performed to show that the performance of a BERT-based PLM can be improved for the downstream NLP task of Named Entity Recognition (NER) by applying supervised domain specific fine-tuning. While we gain a 6% increase in F1 score performance, we are sceptical about these results due to the limited amount of annotated data available leading to the model overfitting and show a lack of capacity to generalize to unseen data from the CoNLL NER dataset.

Files