ST-Sem

None, None; None, None; None, None

ST-Sem

A Multimodal Method for Points-of-Interest Classification Using Street-Level Imagery

Conference Paper (2019)

Author(s)

Shahin Sharifi Noorian (TU Delft - Web Information Systems)

A Psyllidis (TU Delft - Web Information Systems)

A. Bozzon (TU Delft - Web Information Systems)

Research Group

Web Information Systems

Copyright

DOI related publication

https://doi.org/10.1007/978-3-030-19274-7_3

Convolutional neural networks Points of Interest Street-level imagery Semantic similarity Word embeddings

To reference this document use:

https://resolver.tudelft.nl/uuid:a651873c-5aed-4824-8281-308d98f3f9d1

More Info

expand_more

Publication Year

2019

Language

English

Copyright

Research Group

Web Information Systems

Volume number

11496

Pages (from-to)

32-46

ISBN (print)

978-3-030-19273-0

ISBN (electronic)

978-3-030-19274-7

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Street-level imagery contains a variety of visual information about the facades of Points of Interest (POIs). In addition to general mor- phological features, signs on the facades of, primarily, business-related POIs could be a valuable source of information about the type and iden- tity of a POI. Recent advancements in computer vision could leverage visual information from street-level imagery, and contribute to the classification of POIs. However, there is currently a gap in existing literature regarding the use of visual labels contained in street-level imagery, where their value as indicators of POI categories is assessed. This paper presents Scene-Text Semantics (ST-Sem), a novel method that leverages visual la- bels (e.g., texts, logos) from street-level imagery as complementary in- formation for the categorization of business-related POIs. Contrary to existing methods that fuse visual and textual information at a feature- level, we propose a late fusion approach that combines visual and textual cues after resolving issues of incorrect digitization and semantic ambiguity of the retrieved textual components. Experiments on two existing and a newly-created datasets show that ST-Sem can outperform visual-only approaches by 80% and related multimodal approaches by 4%.

Files

ST_Sem_A_Multi_modal_Method_fo... (pdf)

(pdf | 1.7 Mb)

License info not available