Privacy-preserving cross-device tracking

More Info
expand_more

Abstract

Online advertisement is a multi-billion dollar industry that constitutes a primary source of income for most publishers offering free content on the Web. Online behavioural advertisement refers to the practice of serving targeted ads to online users based on their potential interests. In order to infer these interests, online advertisers aggregate and process vast amounts of behavioural data collected from the browsing activities of Internet users. Until recently, online advertisers have been using a device-centric approach to studying user behaviour and delivering targeted ads. With the development and widespread adoption of smart devices, it has become commonplace for a person to own and browse the Web from multiple devices, and online advertisers have been quick to adapt.

Cross-device tracking (CDT) constitutes the practice of identifying and tracking the user of a device rather than the device itself. This is usually achieved by grouping devices based on the likelihood that they belong to the same user. By employing CDT, advertisers are able to build more comprehensive user profiles based on a user's overall online behaviour, and serve advertisement tailored to a user's interests on all of their devices.

While these online advertising practices greatly benefit both advertisers and their clients, the privacy of online users is compromised by the amount and nature of the data collected in the process. Various solutions have been proposed over the years for performing online behavioural advertisement in a privacy-preserving manner, while no such privacy-conscious technological alternatives have been designed to address CDT practices, even though the data collected for the purpose of CDT has considerable overlap with the data collected for serving targeted ads.

In this thesis, we aim to offer a technological solution to the privacy-issues posed by the collection and processing of customer data for the purpose of cross-device tracking. To this end we design PCDT, a protocol that uses fully-homomorphic encryption and keyed hash functions to implement both deterministic and probabilistic CDT techniques that operate on encrypted data.

PCDT operates in a two server setting, where both servers engage in privacy-preserving computations to perform CDT, while simultaneously concealing the secret device data from each other through cryptographic means. The security of PCDT is based on the semi-honest security model, where parties attempt to learn as much as possible from the information presented to them, but do not deviate from the protocol. PCDT performs deterministic CDT in a privacy-preserving manner by concealing the relevant device data using a keyed hash function. The deterministic hashes allow for the fast association of devices through deterministic CDT, while the secrecy of the hashing key ensures the secrecy of the plaintext data. To perform privacy-preserving probabilistic CDT, PCDT uses fully-homomorphic encryption to train and evaluate gradient boosting decision tree models on encrypted device data. To the best of our knowledge, PCDT is the first protocol designed to perform privacy-preserving CDT.