Image watermarking for Machine Learning datasets

Using SVD based image watermarking techniques to watermark numerical ML datasets

More Info


The media watermarking technique domain has had the last 30 years to develop itself. The non-media side, however, is a way newer sub-domain. [1] The data-gathering process for machine learning algorithms is a tedious and time consuming task. This becomes worse as the scale of these algorithms increases. Thus, protecting the datasets against illegal use or sale and proving they are intellectual property is useful. In this paper, we answer the question: How can image watermarking techniques be applied to classification algorithm datasets, without degrading the dataset's quality? Algorithms that use the Singular Value Decomposition (SVD) of the data are often the basis of other matrix decomposition based Image watermarking techniques. Thus if an SVD based algorithm can be applied to a machine learning dataset then the other matrix decomposition based algorithms can also be applied. This implies that a large part of the much older media targeted watermarking techniques can be applied to the non-media datasets. In this paper we apply the watermarking technique described in [2] to a machine learning dataset. This watermark provides decent imperceptibility and robustness against update, zero-out and insertion attacks but it's held back by its lackluster robustness against deletion attacks. That said, we proved that when an image watermark is found that is impervious against deletion attacks, it can be applied to the machine learning datasets.