Binarization of Historical Watermarks
A Review of Thresholding Techniques Applied to Historical Watermark Images
More Info
expand_more
Abstract
A watermark image is a scan of a historical paper document that contains a watermark, which is a motif embedded in the paper that provides valuable information on the origins of a document. Developing tools to automatically identify watermarks can make this information more accessible to researchers. This paper focuses on one specific binarization technique, thresholding. Thresholding selects a threshold value, which is used to turn an image binary such that one color represents foreground and the other represents background. Ideally, binarization isolates the watermark’s shape by representing it as foreground, and removes unwanted information. This research compares the effectiveness of different thresholding techniques when applied to watermark images. Eight algorithms are selected from the literature, and a novel algorithm is proposed that seeks to improve on the other algorithms when applied to watermarks. The nine total algorithms are evaluated quantitatively on synthetic data, and qualitatively through a survey where participants select which algorithm appears best and rate it. The results show that there is no clear algorithm which works best for all images, however a logical adaptive approach may work marginally better than other approaches. Additionally, the presented algorithms do not adequately remove non-watermark information from the images. Further research should be conducted to analyze different binarization techniques in this context.