Detect the watermark through the training model
A watermarking scheme to protect numerical classification datasets
R. Li (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Devris Isler – Mentor (IMDEA Networks Institute)
P. Kellnhofer – Graduation committee member (TU Delft - Computer Graphics and Visualisation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Datasets play an important role in machine learning technology. The quality of a machine learning model is highly dependent on the quality of the training dataset. Datasets are of great economic value and should be viewed as intellectual property. To protect the property rights of machine learning training datasets, we can make use of the watermarking technique. In this paper, we propose a dataset watermarking method for numerical datasets. Our method is modified from the radioactive data method, which is proposed for image datasets. Our method can detect if a linear classifier machine learning model has been trained with the watermarked dataset. The experiment results show that we can detect the watermark with more than 99% confidence with only 1% of data being modified. The watermarking method is not robust against data normalization but is robust against column dropping when the dimension of the dataset is high.