Effectiveness of trip planner data in predicting short-term bus ridership

More Info
expand_more

Abstract

Predictions on public transport ridership are beneficial as they allow for sufficient and cost-efficient deployment of vehicles. At an operational level, this relates to short-term predictions with lead times of less than an hour. Where conventional data sources on ridership, such as Automatic Fare Collection (AFC) data, may have longer lag times, in contrast, trip planner data is often available in (near) real-time. This paper analyzes how such data from a trip planner app can be utilized for short-term bus ridership predictions. This is combined with AFC data (in this case smart card data) to construct a ground-truth on actual ridership. The trip planner data is studied using correlation analysis to select informative variables, that are then used to develop 4 supervised machine learning models (linear, k-nearest neighbors, random forest, and gradient boosting decision tree). The best performing model relies on random forest regression and reduces the error by approximately half compared to a baseline model based on the weekly trend. We show that this model performance is maintained even for prediction lead times up to 30 minutes ahead, and for different periods of the day.