Autonomous vessels offer potential benefits in safety, operational efficiency, and environmental impact, but require reliable perception systems to function independently. This thesis investigates the reliability of camera-based detection of maritime docks, a class of static but
...
Autonomous vessels offer potential benefits in safety, operational efficiency, and environmental impact, but require reliable perception systems to function independently. This thesis investigates the reliability of camera-based detection of maritime docks, a class of static but visually diverse objects that are often difficult to distinguish from their surroundings. While extensive research exists for detecting ships and buoys, docks remain underexplored, with no publicly available datasets containing a significant number of labeled instances.
To address this gap, the Dordrecht Dock Dataset was developed, containing 30,761 frames recorded under real-world maritime conditions across eight distinct docks. A custom interpolation-based annotation tool enabled efficient labeling, achieving annotation speeds of up to 3,000 frames per hour. Two deep learning models—Faster R-CNN and YOLO11n—were trained and evaluated on this dataset using a consistent pipeline. Evaluation followed a leave-one-dock-out cross-validation strategy with fixed test sets and standard performance metrics, including mAP50, mAP50–95, F1 score, and inference speed.
Faster R-CNN outperformed YOLO11n across nearly all accuracy metrics (mAP50 of 0.85 vs. 0.69), particularly at shorter ranges. YOLO11n demonstrated over twice the inference speed (37 FPS vs. 17 FPS), but exhibited weaker generalization and reduced reliability on visually different docks. Both models showed significant performance drops beyond 100–110 meters, emphasizing the need for high-resolution input or focus on short-range detection.