Is Wikipedia succeeding in reducing gender bias?

Assessing the development of gender bias in word embeddings from Wikipedia

More Info
expand_more

Abstract

Large text corpora used for creating word embeddings (vectors which represent word meanings) often contain a stereotypical gender bias. This unwanted bias is then also present in the word embeddings and in downstream applications in the field of natural language processing. To prevent and reduce this, more knowledge about the gender bias is necessary. This paper will contribute to this by showing how gender bias in word embeddings from Wikipedia develops over time. Quantifying the gender bias over time shows that words in Science and Arts have become more female biased. Family and Career have stereotypical biases towards respectively female and male words, which have steadily decreased since 2006. This provides new insight in what should be done to make Wikipedia more gender neutral and how important the time of writing can be when considering biases in training word embeddings from Wikipedia or from other text corpora.