Surfacing Differences in Practices When Building Fair Machine Learning Systems with Fairness Toolkits: an Empirical Study

More Info
expand_more

Abstract

The ability to identify and mitigate various risks and harms of using Machine Learning models in industry is an essential task. Specifically because these may produce harmful outcomes for stakeholders, including unfair or discriminatory results. Due to this there has been substantial research into the concepts of fairness and its metrics, bias and its mitigation, and algorithmic harms and their sources. Various toolkits have been created to guide practitioners to reflect on these topics and provide suggestions on algorithmic solutions to mitigate these risks. However, it is not yet known how widely these toolkits are used and how they are perceived in terms of usefulness. In this project, practitioners were interviewed in order to determine to what extend do envisioned practices of practitioners without experience with fairness toolkits differ from those with the experience. The two toolkits considered were the IBM AI Fairness360 and Microsoft FairLearn. The data collected from the interviews suggests that there could be fewer differences in practices of practitioners with experience and without experience with toolkits, than those with training or work roles in ethics and fairness in ML and those without. This suggests that experience the toolkit itself is not indicative of a more thorough approach to identifying and mitigating harms in fair Machine Learning.