Investigating Coastal Classification with Multi-Modal Large Language Models
H.J. de Heer (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J.C. Gemert – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
SE Verwer – Graduation committee member (TU Delft - Algorithmics)
Antonio Moreno-Rodenas – Mentor (Deltares)
Floris Calkoen – Mentor (Deltares)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Coastal zones are dynamic and vulnerable regions, demanding accurate, scalable monitoring tools to inform environmental management and hazard mitigation. While
satellite imagery and CNN-based classifiers have improved
automated mapping, their reliance on unstructured pixel
data limits contextual understanding. This study presents
the first fine-tuning of a multi-modal large language model
(MLLM), Qwen2.5, on 12-channel satellite input for multilabel coastal classification, demonstrating how architectural adaptation enables integration of spectral, topographic, and derived features beyond RGB. We compare
this approach to a ResNet-50 baseline and state-of-the-art
prompting methods using GPT-4o and LLaMA-3.2. Our experiments on the CoastBench dataset reveal that MLLMs
benefit substantially from few-shot prompting with diverse,
balanced sampling and that fine-tuning Qwen2.5 with full
12-channel input outperforms its RGB-only variant. An
ablation study quantifies the importance of elevation and
water-sensitive indices, while a human benchmark exposes
a performance ceiling near F1 ≈ 0.70 due to label ambiguity. Our findings suggest that while MLLMs can rival traditional models and offer interpretability benefits, future gains depend on dataset quality, input diversity, and
prompting strategy design.