Watermarking of numerical datasets used for ML

None, None

Watermarking of numerical datasets used for ML

A DWT approach for watermarking numerical datasets

Bachelor Thesis (2024)

Author(s)

M.C. Crăciun (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z Erkin – Mentor (TU Delft - Cyber Security)

Devris Isler – Mentor

A. Katsifodimos – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine Learning Discrete wavelet transform Watermarking

To reference this document use:

https://resolver.tudelft.nl/uuid:30c84ab8-7731-498e-b541-4e5ef9b5a9ce

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

20-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

AI and machine learning have been topics of big interest in the last couple of years, with plenty of applications in many domains. To train these models into useful and desirable tools, a large amount of data is necessary. This data is expensive to collect, becoming one of the most valuable commodities of this century. As the value of data increases, protecting this intellectual property becomes more and more relevant. Watermarking is a technique widely used for data protection in media, but the non-media counterpart has not been researched as thoroughly. In this paper, an adaptation of a common watermarking technique, DWT watermarking, is applied on two datasets used for machine learning. This technique is invisible and robust in signal watermarking, but its performance on a numerical dataset has not been previously researched. A previously devised algorithm was used, but it was adjusted to better fit dataset watermarking. To assess the quality of the watermark, the marked data has been subjected to create, remove, update and zero-out attacks. On top of this, multiple machine-learning models have been trained on the marked data. Initial results show that the proposed technique performs well in terms of invisibility, obtaining similar or better accuracies than models trained on the original data, but it is quite sensitive to attacks. Even small modifications, less than 1\% of the data, can break the signature.

Files

Research_paper.pdf

(pdf | 0.431 Mb)

License info not available