Optimised Private Set Intersection for Vertical Federated Tree Models

None, None

Optimised Private Set Intersection for Vertical Federated Tree Models

Master Thesis (2024)

Author(s)

M.C.H. Li (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

R. Hai – Mentor (TU Delft - Web Information Systems)

D. Zhan – Mentor (TU Delft - Web Information Systems)

C. Lofi – Mentor (TU Delft - Web Information Systems)

Jérémie Decouchant – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Privacy Private Set Intersection Vertical Federated Learning

To reference this document use:

https://resolver.tudelft.nl/uuid:0304a61b-14df-44a5-8a72-84b5ea5d1eb6

More Info

expand_more

Publication Year

2024

Language

English

Copyright

Graduation Date

22-01-2024

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In recent years, the rapid advancements in big data, machine learning, and artificial intelligence have led to a corresponding rise in privacy concerns. One of the solutions to address these concerns is federated learning. In this thesis, we will look at the setting of vertical federated learning based on tree models. We have built a system that can do both entity resolution through private set intersection (PSI) and vertical federated learning (VFL). In this system, we have implemented an optimisation to pre-sort the data per feature before the start of VFL. We have also created a privacy framework, where we define four levels of privacy. This optimisation did not affect the privacy level of the system. In our results, we have seen that pre-sorting the data lowers the overall training time. How much depends on the number of entities and features of the passive party. We observe from our results that we estimate the speed-up to be 0.3654 seconds per feature and 0.2093 seconds per 1000 entities.

Files

MSc_CS_Thesis_Martin_Li.pdf

(pdf | 1.46 Mb)

License info not available