Y. Xin
Please Note
25 records found
1
Deep learning models are widely used in traffic forecasting and have achieved state-of-the-art prediction accuracy. However, their black-box nature presents challenges for interpretability and usability, particularly when predictions are significantly influenced by complex urban contextual features. This study aims to leverage an explainable artificial intelligence (AI) approach, counterfactual explanations, to enhance the explainability of deep learning-based traffic forecasting models and elucidate their relationships with various contextual features. We present a comprehensive framework that generates counterfactual explanations for traffic forecasting. The study first implements a graph convolutional network (GCN) to predict traffic speed based on historical traffic data and contextual variables. Counterfactual explanations are generated through a multi-objective optimization process, with four objectives, validity, proximity, sparsity, and plausibility, each emphasizing different aspects of optimization. We investigated the impact of contextual features on traffic speed prediction under varying spatial and temporal conditions. The scenario-driven counterfactual explanations integrate two types of user-defined constraints, directional and weighting constraints, to tailor the search for counterfactual explanations to specific use cases. These tailored explanations benefit machine learning practitioners who aim to understand the model's learning mechanisms and traffic domain experts who seek insights for necessity factors to alter traffic condition. The results showcase the effectiveness of counterfactual explanations in revealing traffic patterns learned by deep learning models and explaining the relationship between traffic prediction and contextual features, demonstrating its potential for interpreting black-box deep learning models.
Deep neural networks are increasingly utilized in mobility prediction tasks, yet their intricate internal workings pose challenges for interpretability, especially in comprehending how various aspects of mobility behavior affect predictions. This study introduces a causal intervention framework to assess the impact of mobility-related factors on neural networks designed for next location prediction — a task focusing on predicting the immediate next location of an individual. To achieve this, we employ individual mobility models to synthesize location visit sequences and control behavior dynamics by intervening in their data generation process. We evaluate the interventional location sequences using mobility metrics and input them into well-trained networks to analyze performance variations. The results demonstrate the effectiveness in producing location sequences with distinct mobility behaviors, thereby facilitating the simulation of diverse yet realistic spatial and temporal changes. These changes result in performance fluctuations in next location prediction networks, revealing impacts of critical mobility behavior factors, including sequential patterns in location transitions, proclivity for exploring new locations, and preferences in location choices at population and individual levels. The gained insights hold value for the real-world application of mobility prediction networks, and the framework is expected to promote the use of causal inference to enhance the interpretability and robustness of neural networks in mobility applications.
Cartographic map generalization involves complex rules, and a full automation has still not been achieved, despite many efforts over the past few decades. Pioneering studies show that some map generalization tasks can be partially automated by deep neural networks (DNNs). However, DNNs are still used as black-box models in previous studies. We argue that integrating explainable AI (XAI) into a DL-based map generalization process can give more insights to develop and refine the DNNs by understanding what cartographic knowledge exactly is learned. Following an XAI framework for an empirical case study, visual analytics and quantitative experiments were applied to explain the importance of input features regarding the prediction of a pre-trained ResU-Net model. This experimental case study finds that the XAI-based visualization results can easily be interpreted by human experts. With the proposed XAI workflow, we further find that the DNN pays more attention to the building boundaries than the interior parts of the buildings. We thus suggest that boundary intersection over union is a better evaluation metric than commonly used intersection over union in qualifying raster-based map generalization results. Overall, this study shows the necessity and feasibility of integrating XAI as part of future DL-based map generalization development frameworks.
The ability to identify causal relationships in spatial data is increasingly important for designing effective policy interventions in environmental science, epidemiology, urban planning, and traffic management. Current spatial data analytic methods rely mainly on descriptive and predictive methods that lack explicit causal models. Spatial causal inference, i.e. causal inference with spatial information offers a promising tool to address this challenge by extending causal inference methodologies to spatial domains. However, this translation is challenging due to spatial effects that might violate fundamental assumptions of causal inference. Spatial causal inference is therefore still in its infancy, and there is a pressing need to accelerate its theoretical development and support its adoption with a well-grounded methodological toolset. To facilitate the necessary interdisciplinary exchange of ideas we convened the first Dagstuhl Seminar on Causal Inference for Spatial Data Analytics. ...
The ability to identify causal relationships in spatial data is increasingly important for designing effective policy interventions in environmental science, epidemiology, urban planning, and traffic management. Current spatial data analytic methods rely mainly on descriptive and predictive methods that lack explicit causal models. Spatial causal inference, i.e. causal inference with spatial information offers a promising tool to address this challenge by extending causal inference methodologies to spatial domains. However, this translation is challenging due to spatial effects that might violate fundamental assumptions of causal inference. Spatial causal inference is therefore still in its infancy, and there is a pressing need to accelerate its theoretical development and support its adoption with a well-grounded methodological toolset. To facilitate the necessary interdisciplinary exchange of ideas we convened the first Dagstuhl Seminar on Causal Inference for Spatial Data Analytics.
In recent years, car-sharing services have emerged as viable alternatives to private individual mobility, promising more sustainable and resource-efficient, but still comfortable transportation. Research on short-term prediction and optimization methods has improved operations and fleet control of car-sharing services; however, long-term projections and spatial analysis are sparse in the literature. We propose to analyze the average monthly demand in a station-based car-sharing service with spatially-aware learning algorithms that offer high predictive performance as well as interpretability. Our study utilizes a rich set of socio-demographic, location-based (e.g., POIs), and car-sharing-specific features as input, extracted from a large proprietary car-sharing dataset and publicly available datasets. We first compare the performance of different modeling approaches and find that a global Random Forest with geo-coordinates as part of input features achieves the highest predictive performance with an R-squared score of 0.87 on test data. While a local linear model, Geographically Weighted Regression, performs almost on par in terms of out-of-sample prediction accuracy. We further leverage the models to identify spatial and socio-demographic drivers of car-sharing demand. An analysis of the Random Forest via SHAP values, as well as the coefficients of GWR and MGWR models, reveals that besides population density and the car-sharing supply, other spatial features such as surrounding POIs play a major role. In addition, MGWR yields exciting insights into the multiscale heterogeneous spatial distributions of factors influencing car-sharing behaviour. Together, our study offers insights for selecting effective and interpretable methods for diagnosing and planning the placement of car-sharing stations.
Vehicle-to-grid and car sharing
Willingness for flexibility in reservation times in Switzerland
Combining vehicle-to-grid (V2G) with car sharing can substantially contribute to decarbonization of both energy and transportation sectors. Car-sharing users’ booking slot flexibility is crucial for integration yet remains underexplored. We developed an integrated choice and latent variable model to estimate the value of financial incentives needed for shifting slots and how it is affected by socio-demographics, latent attitudes, trip-level characteristics. We conducted a stated preference survey with car sharing users in Switzerland. The value of time in our sample ranged between 22.4 CHF/h and 35.5 CHF/h (23.5 USD/h and 37.2 USD/h). Older adults, lower income groups, individuals in employment and with a university degree had lower time flexibility. Work, leisure, trips involving others, trips taking place during weekdays and morning peaks were harder to alter. This flexibility has the potential to encourage car-sharing operators and users to engage in V2G initiatives, contributing to decarbonization of transportation and energy systems.
The proliferation of car sharing services in recent years presents a promising avenue for advancing sustainable transportation. Beyond merely reducing car ownership rates, these systems can play a pivotal role in bolstering grid stability through the provision of ancillary services via vehicle-to-grid (V2G) technologies - a facet that has received limited attention in previous research. In this study, we analyze the potential of V2G in car sharing by designing future scenarios for a national-scale service in Switzerland. We propose an agent-based simulation pipeline that considers population changes as well as different business strategies of the car sharing service, and we demonstrate its successful application for simulating scenarios for 2030. To imitate car sharing user behavior, we develop a data-driven mode choice model. Our analysis reveals important differences in the examined scenarios, such as higher vehicle utilization rates for a reduced fleet size as well as in a scenario featuring new car sharing stations. These disparities translate into variations in the power flexibility of the fleet available for ancillary services, ranging from 12 to 50 MW, depending on the scenario and the time of the day. Furthermore, we conduct a case study involving a subset of the car sharing fleet, incorporating real-world electricity pricing data. The case study substantiates the existence of a sweet spot involving monetary gains for both power grid operators and fleet owners. Our findings provide guidelines to decision makers and underscore the pressing need for regulatory enhancements concerning power trading within the realm of car sharing.
Enhanced efforts in the transportation sector should be implemented to mitigate the adverse effects of CO2 emissions resulting from zoning-based planning paradigms. The concept of a 15-minute city, emphasizing proximity-based planning, holds promise in reducing unnecessary travel and progressing towards carbon neutrality. However, a critical research question remains inadequately explored: to what extent is the 15-minute city concept feasible for American cities? This paper presents a comprehensive framework to evaluate the 15-minute city concept using SafeGraph Point of Interest (POI) check-in data across 12 major American cities. Our findings suggest a prevailing reliance on cars among residents due to the spatial distribution of essential activities beyond convenient walking, cycling, and public transit distances. Nevertheless, there exists significant promise for realizing the 15-minute city vision, given that most residents' daily activities can be accommodated within a 15-minute radius by low-emission modes transportation modes. When comparing cities, it appears that achieving a 15-minute walking city is more feasible for metropolises like New York City, San Francisco, Boston, and Chicago, while proving to be challenging for cities such as Atlanta, Dallas, Houston, and Phoenix. In examing inter-group comparisons, neighborhoods with higher proportion of White residents and higher median incomes tend to have more accessible POIs, with a substantial percentage of activities concentrated within a 15-minute radius. This demographic also shows a greater propensity to fulfill daily activities through walking, cycling, or public transit trips within a 15-minute travel time, thus presenting a greater potential in CO2 reduction compared to African Americans. This study can offer policymakers insight into how far American cities are away from the 15-minute city concept. It also highlights the potential CO2 emissions reductions that could be achieved through successful implementation.
Interpreting Deep Learning Models for Traffic Forecast
A Case Study of Unet
Traffic analysis is crucial for urban operations and planning, while the availability of dense urban traffic data beyond loop detectors is still scarce. We present a large-scale floating vehicle dataset of per-street segment traffic information, Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities (MeTS-10), available for 10 global cities with a 15-minute resolution for collection periods ranging between 108 and 361 days in 2019-2021 and covering more than 1500 square kilometers per metropolitan area. MeTS-10 features traffic speed information at all street levels from main arterials to local streets for Antwerp, Bangkok, Barcelona, Berlin, Chicago, Istanbul, London, Madrid, Melbourne, and Moscow. The dataset leverages the industrial-scale floating vehicle Traffic4cast data with speeds and vehicle counts provided in a privacy-preserving spatio-temporal aggregation. We detail the efficient matching approach mapping the data to the OpenStreetMap (OSM) road graph. We evaluate the dataset by comparing it with publicly available stationary vehicle detector data (for Berlin, London, and Madrid) and the Uber traffic speed dataset (for Barcelona, Berlin, and London). The comparison highlights the differences across datasets in spatio-temporal coverage and variations in the reported traffic caused by the binning method. MeTS-10 enables novel, city-wide analysis of mobility and traffic patterns for ten major world cities, overcoming current limitations of spatially sparse vehicle detector data. The large spatial and temporal coverage offers an opportunity for joining the MeTS-10 with other datasets, such as traffic surveys in traffic planning studies or vehicle detector data in traffic control settings.
Location graphs, compact representations of human mobility without geocoordinates, can be used to personalise location-based services. While they are more privacy-preserving than raw tracking data, it was shown that they still hold a considerable risk for users to be re-identified solely by the graph topology. However, it is unclear how this risk depends on the tracking duration. Here, we consider a scenario where the attacker wants to match the new tracking data of a user to a pool of previously recorded mobility profiles, and we analyse the dependence of the re-identification performance on the tracking duration. We find that the re-identification accuracy varies between 0.41% and 20.97% and is affected by both the pool duration and the test-user tracking duration, it is greater if both have the same duration, and it is not significantly affected by socio-demographics such as age or gender, but can to some extent be explained by different mobility and graph features. Overall, the influence of tracking duration on user privacy has clear implications for data collection and storage strategies. We advise data collectors to limit the tracking duration or to reset user IDs regularly when storing long-term tracking data.
Deploying real-time control on large-scale fleets of electric vehicles (EVs) is becoming pivotal as the share of EVs over internal combustion engine vehicles increases. In this paper, we present a Vehicle-to-Grid (V2G) algorithm to simultaneously schedule thousands of EVs charging and discharging operations, that can be used to provide ancillary services. To achieve scalability, the monolithic problem is decomposed using the alternating direction method of multipliers (ADMM). Furthermore, we propose a method to handle bilinear constraints of the original problem inside the ADMM iterations, which changes the problem class from Mixed-Integer Quadratic Program (MIQP) to Quadratic Program (QP), allowing for a substantial computational speed up. We test the algorithm using real data from the largest carsharing company in Switzerland and show how our formulation can be used to retrieve flexibility boundaries for the EV fleet. Our work thus enables fleet operators to make informed bids on ancillary services provision, thereby facilitating the integration of electric vehicles.
Conserved quantities in human mobility
From locations to trips
Quantifying intra-person variability in travel choices is essential for the comprehension of activity–travel behaviour. Due to a lack of empirical studies, there is limited understanding of how an individual's travel pattern evolves over months and years. We use two high-resolution user-labelled datasets consisting of billions of GPS track points from ∼3800 individuals to analyse individuals’ activity–travel behaviour over the long term. The general movement patterns of the considered population are characterised using mobility indicators. Despite the differences in the mobility patterns, we find that individuals from both datasets maintain a conserved quantity in the number of essential travel mode and activity location combinations over time, resulting from a balance between exploring new choice combinations and exploiting existing options. A typical individual maintains ∼15 mode–location combinations, of which ∼7 are travelled with a private vehicle every 5 weeks. The dynamics of this stability reveal that the exploration speed of locations is faster than the one for travel modes, and they can both be well modelled using a power-law fit that slows down over time. Our findings enrich the understanding of the long-term intra-person variability in activity–travel behaviour and open new possibilities for designing mobility simulation models.
Complex simulations and machine-learning models increase in application in research, industry, and governance. However, applying these systems with reasonable accuracy and efficiency requires large-scale efforts of data collection, data transformation, data analysis, and data visualization. At the same time, maintaining the required infrastructure, software, and personnel skyrockets making these tools unavailable to many potential users. The paradigm of the digital twin offers a novel perspective on how to manage the data efficiently and make these systems available more steadily at a lower cost. We introduce the first prototype of the Open Digital Twin Platform (ODTP) that is designed to be openly available to all interested parties to enable a common framework and baseline for digital twin based research. ODTP uses containerization, loose coupling, and micro-services to provide dynamically composable digital twins. ODTP also provides tools for licensing resolution, privacy and access control, and reproducibility. In its first iteration presented here, ODTP implements a common mobility research pipeline of the eqasim pipeline for MATSim. These kind of programs are usually difficult to assemble and use, thus leading to dangerous versions of 'never change a running system'. ODTP converts them into an easy-to-use version making it possible to initiate mobility simulations with one click. ODTP enables the quick adding of relevant data sources and analytical pipelines related to any topic and make them easily usable, accessible and shareable to research, industry, and governance. Thus, ODTP expands the FAIR principle from data to the complete data life cycle.