Multi-Source Data Modelling to Understand the Effects of Tourism Demand on Air Quality in Italy

Master thesis (2023)

Authors

A.A. Kadiev Electrical Engineering, Mathematics and Computer Science

Contributors

C. Lofi Web Information Systems - (mentor)

S.S. Chakraborty Programming Languages - (coach)

L. Chu Web Information Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:b2dc604b-0a91-481e-acd6-90378e27e782

Published Date

26-06-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

The goal of this research is to model and understand the effects of tourism demand on air quality by performing data integration on multi-source data. This research is aimed at researchers and practitioners aiming to perform multidisciplinary research in the fields of data science and geoscience, presenting the methods and challenges that arise when performing such an analysis. A data processing pipeline explains the research from a data integration perspective involving the data retrieval and pre-processing tasks. This enables the construction of datasets for machine learning modelling and prediction of air pollutant levels based on tourism data. The study area of this research is Italy which is chosen based on its significant tourism industry and wide availability of data about tourism development. For this study, in situ air quality data sampled using Google Earth Engine (GEE) around accommodation, transportation and tourism attraction locations is modelled with tourist arrival numbers, nights spent and average length of stay. Long short-term memory (LSTM) multivariate time series modelling is performed afterwards to understand predictability of air quality on a national and regional level. To this end, this research looks into three different stages of the modelling process of tourism with air quality which are: (i) retrieving accommodation, transportation and tourism attraction locations using the RDF model, (ii) identifying which pollutants are correlated and Granger-caused by the different tourism demand features using sampled satellite air quality data of the identified tourism locations, (iii) understanding performance characteristics of LSTM time series models by training on tourism demand and air quality data. Correlation analysis indicates the potential to model the relation between tourism demand indicators and PM2.5 in overall cleaner regions in terms of this pollutant. In these regions, Granger-causality testing suggests a higher chance of predictability of PM2.5 time series using tourism demand data from the previous month. Training an LSTM model using the information of this lagged relationship suggests that regions with overall high PM2.5 levels are challenging to model showing high RMSE scores. Training an LSTM model for these regions also required more epochs compared to overall cleaner regions to model the effects of tourism demand on air quality.

Files

Multi_Source_Data_Modelling_to... (.pdf)

(.pdf | 3.89 Mb)