Towards a human in the loop approach to preserve privacy in images

More Info
expand_more

Abstract

Current artificial intelligence and information retrieval systems need to be trained with a large amount of data to achieve satisfying performance. A popular solution to create such datasets is to employ crowdsourcing; however, the content to be annotated may contain private or sensitive information that can be extracted by workers, limiting the applicability of crowdsourcing data annotation techniques in privacy-sensitive contexts. In this paper, we survey the literature finding that current solutions in crowdsourcing and machine learning do not provide satisfactory solutions as they either hinder the capabilities of workers to annotate the data, increase the overall cost, or lack generalizability. We identify current challenges, propose and elaborate a hybrid human-machine approach to detect private information in images, discuss its features and propose future directions.