ST-Sem

A Multimodal Method for Points-of-Interest Classification Using Street-Level Imagery

Conference Paper (2019)
Author(s)

Shahin Sharifi Noorian (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Achilleas Psyllidis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Alessandro Bozzon (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Web Information Systems
DOI related publication
https://doi.org/10.1007/978-3-030-19274-7_3 Final published version
More Info
expand_more
Publication Year
2019
Language
English
Research Group
Web Information Systems
Volume number
11496
Pages (from-to)
32-46
Publisher
Springer
ISBN (print)
978-3-030-19273-0
ISBN (electronic)
978-3-030-19274-7
Event
19th International Conference on Web Engineering, ICWE 2019 (2019-06-11 - 2019-06-14), Daejeon Convention Center (DCC), Daejeon, Korea, Republic of
Downloads counter
266
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Street-level imagery contains a variety of visual information about the facades of Points of Interest (POIs). In addition to general mor- phological features, signs on the facades of, primarily, business-related POIs could be a valuable source of information about the type and iden- tity of a POI. Recent advancements in computer vision could leverage visual information from street-level imagery, and contribute to the classification of POIs. However, there is currently a gap in existing literature regarding the use of visual labels contained in street-level imagery, where their value as indicators of POI categories is assessed. This paper presents Scene-Text Semantics (ST-Sem), a novel method that leverages visual la- bels (e.g., texts, logos) from street-level imagery as complementary in- formation for the categorization of business-related POIs. Contrary to existing methods that fuse visual and textual information at a feature- level, we propose a late fusion approach that combines visual and textual cues after resolving issues of incorrect digitization and semantic ambiguity of the retrieved textual components. Experiments on two existing and a newly-created datasets show that ST-Sem can outperform visual-only approaches by 80% and related multimodal approaches by 4%.

Files

License info not available