Investigating Coastal Classification with Multi-Modal Large Language Models

None, None

Investigating Coastal Classification with Multi-Modal Large Language Models

Master Thesis (2025)

Author(s)

H.J. de Heer (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. Van Gemert – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

S.E. Verwer – Graduation committee member (TU Delft - Algorithmics)

Antonio Moreno-Rodenas – Mentor (Deltares)

Floris Calkoen – Mentor (Deltares)

Faculty

Electrical Engineering, Mathematics and Computer Science

Deep Learning CNN Multimodal Remote sensing Classification LLM Coastal Monitoring

To reference this document use:

https://resolver.tudelft.nl/uuid:9b24e01b-a9d4-40a6-98ce-d9c21b955f8f

More Info

expand_more

Publication Year

2025

Language

English

Coordinates

51.9857, 4.3802

Graduation Date

04-06-2025

Awarding Institution

Delft University of Technology

Project

['MSc Thesis']

Programme

['Computer Science | Artificial Intelligence']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Coastal zones are dynamic and vulnerable regions, demanding accurate, scalable monitoring tools to inform environmental management and hazard mitigation. While
satellite imagery and CNN-based classifiers have improved
automated mapping, their reliance on unstructured pixel
data limits contextual understanding. This study presents
the first fine-tuning of a multi-modal large language model
(MLLM), Qwen2.5, on 12-channel satellite input for multilabel coastal classification, demonstrating how architectural adaptation enables integration of spectral, topographic, and derived features beyond RGB. We compare
this approach to a ResNet-50 baseline and state-of-the-art
prompting methods using GPT-4o and LLaMA-3.2. Our experiments on the CoastBench dataset reveal that MLLMs
benefit substantially from few-shot prompting with diverse,
balanced sampling and that fine-tuning Qwen2.5 with full
12-channel input outperforms its RGB-only variant. An
ablation study quantifies the importance of elevation and
water-sensitive indices, while a human benchmark exposes
a performance ceiling near F1 ≈ 0.70 due to label ambiguity. Our findings suggest that while MLLMs can rival traditional models and offer interpretability benefits, future gains depend on dataset quality, input diversity, and
prompting strategy design.

Files

Organized.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 27-05-2026