On the fairness of crowdsourced training data and Machine Learning models for the prediction of subjective properties. The case of sentence toxicity

To be or not to be #$@&%*! toxic? To be or not to be fair?

More Info
expand_more

Abstract

Training machine learning (ML) models for natural language processing usually requires lots of data that is often acquired through crowdsourcing. In crowdsourcing, crowd workers annotate data samples according to one or more properties, such as the sentiment of a sentence, the violence of a video segment, the aesthetics of an image, ... To ensure quality of the annotations, several workers annotate the same sample, and their annotations are combined into one unique label using aggregation techniques such as majority voting.

When the property to be annotated by the workers is subjective, the workers’ annotations for one same sample might differ, but all be valid. The way the annotations are aggregated can have an effect on the fairness of the outputs of the trained model. For example only accounting for the majority vote leads to ignoring the workers’ opinions which differ from the majority and consequently being discriminative towards certain workers. Also, ML models are not always designed to account for individual opinions, for simplicity's or performance's sake. Finally, to the best of our knowledge, no method exists to assess the fairness of a ML algorithm predicting a subjective property. In this thesis we address such limitations by seeking an answer to the following research question: how can targeted crowdsourcing be used to increase the fairness of ML algorithms trained for subjective properties' prediction?

We investigate how annotation aggregation via majority voting creates a dataset bias towards the majority opinion, and how this dataset bias in combination with the current limits of ML models lead to an algorithmic bias of the ML models trained with this dataset and unfairness in the model’s outputs. We assume that an ML model able to return each annotation of each user is a fair model. We propose a new evaluation method of the ML models' fairness, and a methodology to highlight and mitigate potential unfairness based on the creation of adapted training datasets and ML models.
Although our work is applicable to any kind of label aggregation for any data subject to multiple interpretations, we focus on the effects of the bias introduced by majority voting for the task of predicting sentence toxicity.

Our results show that the fairness evaluation method that we create enables to identify unfair algorithms and compare algorithmic fairness, and the final fairness metric is usable in the training process of ML models. The experiments on the models point out that we can mitigate the biases resulting from majority voting and increase the fairness towards the minority opinions. This is provided that the workers’ individual information and each of their annotations are taken into account when training adapted models, rather than only relying on the aggregated annotations, and that the dataset is resampled on criteria according to the favoured aspect of fairness. We also highlight that more work needs to be done to develop crowdsourcing methods to collect high-quality annotations of subjective properties, possibly at low-cost.

Files