Detect the watermark through the training model

A watermarking scheme to protect numerical classification datasets

More Info


Datasets play an important role in machine learning technology. The quality of a machine learning model is highly dependent on the quality of the training dataset. Datasets are of great economic value and should be viewed as intellectual property. To protect the property rights of machine learning training datasets, we can make use of the watermarking technique. In this paper, we propose a dataset watermarking method for numerical datasets. Our method is modified from the radioactive data method, which is proposed for image datasets. Our method can detect if a linear classifier machine learning model has been trained with the watermarked dataset. The experiment results show that we can detect the watermark with more than 99% confidence with only 1% of data being modified. The watermarking method is not robust against data normalization but is robust against column dropping when the dimension of the dataset is high.