Speeding is a key behavioural factor contributing to increased crash frequencies along road segments, especially horizontal curves. Estimating the effect of speeding on crashes is, however, very challenging due to several reasons. Traditional speeding data collection methods ofte
...
Speeding is a key behavioural factor contributing to increased crash frequencies along road segments, especially horizontal curves. Estimating the effect of speeding on crashes is, however, very challenging due to several reasons. Traditional speeding data collection methods often introduce measurement error in the analysis. In addition, there is a complex inter-relationship between driver behaviour, roadway geometry, and crash risk leading to endogeneity between speeding and crash risk. While instrumental variable modelling has been previously used for addressing such endogeneity, the effectiveness of this technique depends on strong instruments that correlate well with speeding but not with crashes. Moreover, the effects of explanatory variables on crashes may vary across locations and time too.
This study aims to address these gaps by developing a new methodology combining improved data collection and a hybrid statistical-machine learning model for better identification of speeding and a more accurate estimation of its effect on crashes. The model, tested on 179 km of horizontal curves along rural roads in Iran, integrates negative binomial regression and gradient boosting with shapley values. The negative binomial model is specified with random parameters and mixed spline indicators accounting for unobserved heterogeneity and temporal instability in the data. Results indicate high predictive power of the machine learning model in predicting speeding from exogenous variables, complemented by intuitive shapley values and feature importance for those variables. A comparison of statistical fit between the proposed model and several state-of-the-art modelling candidates showed that our model is superior to the existing modelling techniques. The results of this model suggest that curve’s geometry and traffic characteristics are strong predictors of speeding, while driving more than 20 % over the speed limit substantially contributes to increased crash frequency. The effects of passenger and heavy vehicle traffic on crashes change over time.