Optimised Private Set Intersection for Vertical Federated Tree Models

More Info
expand_more

Abstract

In recent years, the rapid advancements in big data, machine learning, and artificial intelligence have led to a corresponding rise in privacy concerns. One of the solutions to address these concerns is federated learning. In this thesis, we will look at the setting of vertical federated learning based on tree models. We have built a system that can do both entity resolution through private set intersection (PSI) and vertical federated learning (VFL). In this system, we have implemented an optimisation to pre-sort the data per feature before the start of VFL. We have also created a privacy framework, where we define four levels of privacy. This optimisation did not affect the privacy level of the system. In our results, we have seen that pre-sorting the data lowers the overall training time. How much depends on the number of entities and features of the passive party. We observe from our results that we estimate the speed-up to be 0.3654 seconds per feature and 0.2093 seconds per 1000 entities.