Value-Sensitive Rejection of Machine Learning Predictions for Hate Speech Detection

Master thesis (2022)

Authors

P.M. Lammerts Electrical Engineering, Mathematics and Computer Science

Contributors

J. Yang Web Information Systems - (supervisor 1)

P. Lippmann Web Information Systems - (supervisor 1)

Y-C. Hsu Universiteit van Amsterdam (supervisor 1)

G.J.P.M. Houben Web Information Systems - (supervisor 2)

Catharine Oertel Interactive Intelligence - (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine learning Human-ai collaboration Crowdsourcing Hate speech detection

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:fae9814e-fb18-4ab9-b68c-ae4fcc198a7e

Published Date

13-10-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Hate speech detection on social media platforms remains a challenging task. Manual moderation by humans is the most reliable but infeasible, and machine learning models for detecting hate speech are scalable but unreliable as they often perform poorly on unseen data. Therefore, human-AI collaborative systems, in which we combine the strengths of humans' reliability and the scalability of machine learning, offer great potential for detecting hate speech. While methods for task handover in human-AI collaboration exist that consider the costs of incorrect predictions, insufficient attention has been paid to estimating these costs. In this work, we propose a value-sensitive rejector that automatically rejects machine learning predictions when the prediction's confidence is too low by taking into account the users' perception regarding different types of machine learning predictions. We conducted a crowdsourced survey study with 160 participants to evaluate their perception of correct, incorrect and rejected predictions in the context of hate speech detection. We introduce magnitude estimation, an unbounded scale, as the preferred method for measuring user perception of machine predictions. The results show that we can use magnitude estimation reliably for measuring the users' perception. We integrate the user-perceived values into the value-sensitive rejector and apply the rejector to several state-of-the-art hate speech detection models. The results show that the value-sensitive rejector can help us to determine when to accept or reject predictions to achieve optimal model value. Furthermore, the results show that the best model can be different when optimizing model value compared to optimizing more widely used metrics, such as accuracy.

Files

Value_Sensitive_Rejection_of_M... (.pdf)

(.pdf | 0.524 Mb)