From recognition to understanding: enriching visual models through multi-modal semantic integration

None, None

doi:10.4233/uuid:51fc7a95-fe5f-4519-989c-39646e191148

From recognition to understanding: enriching visual models through multi-modal semantic integration

Doctoral Thesis (2025)

Author(s)

S. Sharifi Noorian (TU Delft - Web Information Systems)

Contributor(s)

G.J. Houben – Promotor (TU Delft - Web Information Systems)

A. Bozzon – Promotor (TU Delft - Sustainable Design Engineering)

Jie Yang – Copromotor (TU Delft - Web Information Systems)

Research Group

Web Information Systems

To reference this document use:

https://doi.org/10.4233/uuid:51fc7a95-fe5f-4519-989c-39646e191148

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Web Information Systems

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis addresses the semantic gap in visual understanding, improving visual models with semantic reasoning capabilities so they can handle tasks like image captioning, question-answering, and scene understanding. The main focus is on integrating visual and textual data, leveraging human cognitive insights, and developing a robust multi-modal foundation model. The research starts with the exploration of multi-modal data integration to enhance semantic and contextual reasoning in fine-grained scene recognition. The proposed multi-modal models, which combine visual and textual inputs, outperform traditional models that rely solely on visuals. This is particularly true in complex urban environments where visual ambiguities often occur. This method emphasizes the significance of semantic enrichment through multi-modal integration, which helps resolve visual ambiguities and improve scene understanding.

Files

Shahin_sharifi_dissertation.pd... (pdf)

(pdf | 41.3 Mb)