Non-recurrent traffic events, consisting of events of an unpredictable nature such as incidents and vehicle breakdowns, can either directly or indirectly influence road traffic. A better understanding of these events could prove beneficial towards improving a multitude of facets concerning the management of the Dutch road network. Traditional traffic event detection, based on significant changes in traffic flow/speed characteristics, is often limited by sparse road sensor coverage. More importantly, traditional detection methods are unable to categorize and describe traffic events.
The aim of this study is to explore to which extent geosocial data (e.g., data from Twitter and Waze) could enrich traditional traffic data (e.g., traffic speed/flow data), in order to improve the detection, categorization, and description of traffic events in the Netherlands. In order to achieve this, a pipeline was designed for extracting knowledge on traffic events from geosocial data sources. We collected geosocial data from Twitter, Waze, and TomTom and used traffic data provided by DiTTLab. We specifically focused on reports by real road users, which we define as natural persons that report on their own account, therefore excluding all legal person entity accounts such as public/private organizations, and bots. A machine learning approach was applied to automatically classify tweets as either traffic event related or not. In order to categorize tweets into a traffic event category, a rule-based traffic domain annotator was created. Additionally, a geocoding method to link tweets to a geographic location was developed. As Waze and TomTom event reports are classified and geocoded by default, we could cluster these reports together with the processed tweets based on their categorical, spatial and temporal extent into a combined traffic event. These combined traffic event reports were then linked to traffic data, based on corresponding spatial and temporal aspects. In order to present the collected data, a web-based interactive map application was built.
This methodology was applied to data collected over the period from 05-12-2017 to 17-02-2018. From the set of collected tweets approximately 6.71% proved traffic event related. Based on a linear support vector machine classification model we achieved an average f1-score of 0.95 and an accuracy of 0.954, for detecting traffic event-related tweets. The rule-based traffic domain annotator showed an average f1-score of 0.874, and an accuracy of 0.964. The geocoding method proved able to geocode tweets to a location that covers all place indicators in a tweet in 86% of the evaluated cases. The remaining 14% of the tweets either got geocoded to a part of relevant indicators or to no relevant indicators at all. Our clustering approach is able to cluster 39.61% of the event reports into a traffic event report cluster consisting out of more than one event report, from which 48.66% could be linked to traffic data.
All in all, based on the achieved results, this work shows that geosocial data can be used to enrich traffic data towards the improvement of the detection, categorization, and description of non-recurrent traffic events.