Authorship attribution is the task of determining the unknown author of a text. In forensic authorship attribution, the likelihood that a suspect has written a specific text of unknown origin is computed based on reference texts from both the suspect and a background population. 
                                ...
                            
                         
                        
                        
                            Authorship attribution is the task of determining the unknown author of a text. In forensic authorship attribution, the likelihood that a suspect has written a specific text of unknown origin is computed based on reference texts from both the suspect and a background population. The current method used at the Netherlands Forensic Institute contains a manual and a computational part. In this thesis, we attempted to improve the computational part of this process. We study this problem from three directions.
Firstly, the performance of state-of-the-art computational authorship attribution methods was assessed on Dutch, forensically relevant corpora. The compared methods were support vector machines combined with masking, using either word or character n-grams as features, BERT-based models using a mean pooling strategy to handle long texts and the baseline, which consists of a logistic regression model with the 100 most frequent Dutch words as features. We notice similar performance differences between state-of-the-art methods as in the literature. The best-performing method was a support vector machine without masking using character n-grams as features. In comparison, both the baseline and BERT-based models perform worse on our corpora.
Secondly, a score-based likelihood ratio system was created to modify the computational authorship attribution methods for usage in forensics. This method is based on kernel density estimators and uses cross-calibration to handle the small number of training and calibration texts of the suspect. For most methods, the performance is in line with the previous performances outside the likelihood ratio system, except for the BERT-based methods, which significantly underperform when part of a likelihood ratio system. This is likely caused by the combination of cross-calibration and the randomness in finetuning BERT models.
Additionally, authorship attribution methods should be topic-robust, such that their attribution is not biased by the topic of a text. We introduced two new metrics to measure the topic-robustness of authorship attribution methods, ‘topic impact’ and ‘conversation impact’. These metrics can only be used on specific types of corpora, the topic impact can be computed on topic-controlled corpora and the conversation impact can be computed on conversational corpora. To study whether these metrics both measured the topic-robustness of authorship attribution methods for their respective corpus type, we computed the correlation between the results of the metrics for varying authorship attribution methods.
We found a correlation of 0.68. As a result, we cannot conclude that the conversation impact is a perfect metric to measure the topic-robustness of methods using conversational corpora, but it does give a good indication of large differences between methods.
Using this new metric, we found that our best-performing methods suffered from a high conversation impact and, as a result, might be more likely to have a low topic-robustness. If more of the infrequent words were masked, the conversation impact decreased, but so did the performance. A trade-off between high performance and high topic-robustness must be made when a model is chosen for real forensic case work. The conversation impact metric we proposed can help quantify these effects on forensically relevant corpora and therefore assist in making better choices.