Automatic Segmentation of Ships in Digital Images

A Deep Learning Approach

More Info
expand_more

Abstract

Knowledge on adversaries during military missions at sea heavily influences decision making, making identification of unknown vessels an important task. Identification of surrounding vessels based on visual data offers an alternative to AIS information (Automatic Identification System), the current standard in vessel identification, which can be spoofed. One visual approach employs human expertise and manually identifies vessels guided by a ship catalog. In order to minimize or potentially eliminate human error and performance limitations, there is strong interest in developing an automated vessel classification pipeline. One such pipeline is currently being developed at TNO, capable of classifying over 500 separate classes. A crucial part of the classification pipeline is retrieving an accurate contour of a vessel from a digital image.
To address this important challenge, this thesis proposes an advanced deep learning pipeline to automatically segment the vessel image into background (e.g. sky and sea) and the object of interest (a vessel). Deep learning models based on Fully Convolutional Neural Networks (FCNs) have achieved high performance on the task of semantic segmentation. Several networks such as CRF-RNN, PSPNet, DeepLab and Mask R-CNN are employed to determine a baseline performance. We will focus on identifying the cause of poor or failing segmentations and aim to construct a robust network capable of handling these challenges. By sampling disturbances, caused by ship distance and camera noise, augmented data sets are built to tune networks to input from on-site images. Additionally, experiments are done to evaluate the influence of different levels of disturbances.
Previous approaches implementing the CRF-RNN network achieved top 1 and top 5 classification accuracies of 31.1% and 44.0% respectively. Employing the DeepLab network, trained to convergence on artificial noise augmented data, we report top 1 and top 5 accuracy of 68.9% and 88.8% respectively. Additionally, implementing an ensemble of classifiers, performance is increased to 73.0% and 91.7% for top 1 and top 5 accuracy respectively. This best result is comparable to the classification results with human annotated ship silhouettes. The human performance accuracy is 73.4% on top 1, and 91.3% on top 5 classification performance. Finally, we show that training on a collection of different levels of image disturbances results in a network that is robust against increasing disturbance in images, while retaining performance on clean images.