XL

X. Li

info

Please Note

4 records found

Journal article (2018) - Xinchao Li, Martha Larson, Alan Hanjalic
We propose an image representation and matching approach that substantially improves visual-based location estimation for images. The main novelty of the approach, called distinctive visual element matching (DVEM), is its use of representations that are specific to the query image whose location is being predicted. These representations are based on visual element clouds, which robustly capture the connection between the query and visual evidence from candidate locations. We then maximize the influence of visual elements that are geo-distinctive because they do not occur in images taken at many other locations. We carry out experiments and analysis for both geo-constrained and geo-unconstrained location estimation cases using two large-scale, publicly available datasets: the San Francisco Landmark dataset with 1.06 million street-view images and the MediaEval'15 Placing Task dataset with 5.6 million geo-tagged images from Flickr. We present examples that illustrate the highly transparent mechanics of the approach, which are based on commonsense observations about the visual patterns in image collections. Our results show that the proposed method delivers a considerable performance improvement compared to the state-of-the-art. ...
Conference paper (2017) - Jaeyoung Choi, Martha Larson, Xinchao Li, Kevin Li, Gerald Friedland, Alan Hanjalic
Today's geo-location estimation approaches are able to infer the location of a target image using its visual content alone. These approaches typically exploit visual matching techniques, applied to a large collection of background images with known geo-locations. Users who are unaware that visual analysis and retrieval approaches can compromise their geo-privacy, unwittingly open themselves to risks of crime or other unintended consequences. This paper lays the groundwork for a new approach to geo-privacy of social images: Instead of requiring a change of user behavior, we start by investigating users' existing photo-sharing practices. We carry out a series of experiments using a large collection of social images (8.5M) to systematically analyze how photo editing practices impact the performance of geo-location estimation. We find that standard image enhancements, including filters and cropping, already serve as natural geo-privacy protectors. In our experiments, up to 19% of images whose location would otherwise be automatically predictable were unlocalizeable after enhancement. We conclude that it would be wrong to assume that geo-visual privacy is a lost cause in today's world of rapidly maturing machine learning. Instead, protecting users against the unwanted effects of pixel-based inference is a viable research field. A starting point is understanding the geo-privacy bonus of already established user behavior. ...
Conference paper (2016) - Xinchao Li, Peng Xu, Yue Shi, Martha Larson, Alan Hanjalic
In this paper, we present a subclass-representation approach that predicts the probability of a social image belonging to one particular class. We explore the co-occurrence of user-contributed tags to find subclasses with a strong connection to the top level class. We then project each image onto the resulting subclass space, generating a subclass representation for the image. The advantage of our tag-based subclasses is that they have a chance of being more visually stable and easier to model than top-level classes. Our contribution is to demonstrate that a simple and inexpensive method for generating sub-class representations has the ability to improve classification results in the case of tag classes that are visually highly heterogenous. The approach is evaluated on a set of 1 million photos with 10 top-level classes, from the dataset released by the ACM Multimedia 2013 Yahoo! Large-scale Flickr-tag Image Classification Grand Challenge. Experiments show that the proposed system delivers sound performance for visually diverse classes compared with methods that directly model top classes. ...
Doctoral thesis (2016) - Xinchao Li
The geo-graphical location at which an image or video was taken is a key piece of multimedia information. Such geo-information has become an indispensable component of systems enabling personalized and context-aware multimedia services. The research reported in this thesis investigates how to automatically derive geo-information from multimedia content. In particular, it focuses on the challenge of estimating the geo-coordinates of the location of an image solely on the basis of its visual content. The goal of the research is to develop a scalable visual content-based location estimation system for images and to investigate the possibilities to improve its accuracy and reliability to a substantial extent. The system should be applicable in both the geo-constrained scenario, in which the multimedia item is taken at one of a previously defined set of locations, and the geo-unconstrained scenario, in which the multimedia item could have been taken anywhere in the world. The thesis makes two different kinds of contributions. The first is high-level framework design. We develop a generic large-scale image retrieval-based framework for location estimation. The second is optimization of specific components of the system. We develop two approaches, geometric verification and geo-distinctive visual element matching, that address specific challenges faced by our retrieval-based framework. The resulting system makes location estimation more tractable in case of large image collections, and also more reliable. Our experimental results demonstrate that the system leads to an overall significant improvement of the location estimation performance and redefines the state-of-the art in both geo-constrained and geo-unconstrained location estimation. Based on the findings presented in this thesis, we make recommendations for future research directions, which we think are substantial and promising for large scale image retrieval and geo-location estimation. ...