Optimised Private Set Intersection for Vertical Federated Tree Models

Master Thesis (2024)
Author(s)

M.C.H. Li (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

R. Hai – Mentor (TU Delft - Web Information Systems)

D. Zhan – Mentor (TU Delft - Web Information Systems)

C. Lofi – Mentor (TU Delft - Web Information Systems)

Jérémie Decouchant – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2024 Martin Li
More Info
expand_more
Publication Year
2024
Language
English
Copyright
© 2024 Martin Li
Graduation Date
22-01-2024
Awarding Institution
Delft University of Technology
Programme
['Computer Science']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In recent years, the rapid advancements in big data, machine learning, and artificial intelligence have led to a corresponding rise in privacy concerns. One of the solutions to address these concerns is federated learning. In this thesis, we will look at the setting of vertical federated learning based on tree models. We have built a system that can do both entity resolution through private set intersection (PSI) and vertical federated learning (VFL). In this system, we have implemented an optimisation to pre-sort the data per feature before the start of VFL. We have also created a privacy framework, where we define four levels of privacy. This optimisation did not affect the privacy level of the system. In our results, we have seen that pre-sorting the data lowers the overall training time. How much depends on the number of entities and features of the passive party. We observe from our results that we estimate the speed-up to be 0.3654 seconds per feature and 0.2093 seconds per 1000 entities.

Files

MSc_CS_Thesis_Martin_Li.pdf
(pdf | 1.46 Mb)
License info not available