Characterising AI Weakness in Detecting Personal Data from Images By Crowds

More Info


This thesis looks at how to characterize weaknesses in machine learning models that are used for detecting privacy-sensitive data in images with the help of crowdsourcing. Before we can come up with a method to achieve a goal, we first need to make clear what we consider privacy-sensitive data. We took the General Data Protection Regulation (GDPR) as a starting point, and performed a crowdsourcing task to see how workers interpret this regulation. Interpreting legal texts can be difficult, there is room for interpretation and the perception of a legal text can change over time. Therefore, we need to take the input of the crowd, next to our own input, to operationalize this regulation to use in this context. Next, we took a machine learning model for detecting privacy-sensitive data in images in order to retrieve saliency maps, which helps us with explaining the inner-working of the model. Subsequently, the saliency maps are inspected through a crowdsourcing task, with the established privacy definition, to find out the strengths and weaknesses. From the results, we see that crowd workers can be efficiently used to find the strengths and weaknesses of a machine learning model, while keeping the privacy definition in mind. Workers are able to consistently apply their views about privacy across different images, whilst also increasing the trust people have in the machine learning model. This shows us that we can use crowdsourcing efficiently in a fairly difficult context of privacy, and paves the way for a more sophisticated approach to privacy-sensitive elements in images, and even for contexts other than privacy.