Publishing Privacy Sensitive Open Data using an Automated Decision Support System

More Info


In recent years, the idea of Open Data has gained popularity, mainly due to the initiative of the president of the USA, Barack Obama. He started a promotion campaign for an Open Government and ordered government data to be made available as Open Data. In essence, Open Data is data published online which can be used and republished without restrictions from mechanisms of control. It is believed that most government data can be leveraged as fuel for innovation. A survey by TNO for the Dutch policymakers concluded that Open Data, including government data, has big economic value. The motivation and intuition behind our research is that the publication of large sets of (linked) Open Data may lead to unforeseen breaches in data sensitivity and privacy. Therefore, we conducted interviews with policy makers who are responsible for the publication of Open Data. From the interviews, we conclude that there is no clear view on the possible issues surrounding the publication of Open Data. The data publishing task is usually delegated to institutions such as Statistics Netherlands(CBS). Moreover, a literature survey demonstrated that current research focuses solely on data privacy, ignoring other forms of unwanted publication such as data sensitivity. The unclear view on possible issues and the focus of literature on data privacy have motivated us to propose a new data publishing process, supported by an automated decision support tool. For this system we present an architecture and a reference implementation. Furthermore, we propose a more extensive definition of data sensitivity. We also present definitions for privacy and utility metrics and we use these metrics to compare anonymization algorithms.