To order or not to order: Predicting customer grocery shopping behaviour using multi-label classification techniques
More Info
expand_more
Abstract
Research and Objective: In the recent years the online grocery sector experienced an enormous uplift and evolved to a highly competitive business sector. Within this demanding environment, the need for strategic information has become extremely important, as it greatly enhances decision-making processes and the optimisation of the supply chain. In this research, a novel approach is proposed that is aimed at predicting customers’ daily purchase probabilities, with the goal to improve short-term forecasting accuracy. Besides the well-acknowledged importance of forecasting practices and customer relationship management, this research is motivated by three main observations in online grocery retail; short interpurchase times, consistent shopping patterns and loyal customers. Methodology: The approach involves the application of binary classification methods to analyse and predict online shopping behaviour. Within this context, two non-parametric learning algorithms, namely stochastic gradient boosting and random forest, are compared to traditional logistic regression. Both stochastic gradient boosting and logistic regression are extended using classifier chains (CC) to handle multiple outputs. Subsequently, the obtained purchase probabilities are aggregated and compared to the predictions of a univariate Seasonal Autoregressive Integrated Moving Average Exogenous (SARIMAX) time series model. Results: The boosted tree CC model was able to achieve an improvement of 1.77% in mean-absolute-percentage error (MAPE) and 20.95% in mean-squared logarithm of the accuracy ratio (MSLAR) compared to the predictions of the random forest and an improvement of 1.15% in MAPE and 16.81% in MSLAR compared to the SARIMAX time series model. The model acquired consistent results for customer groups of different sizes, with prediction errors that exhibited the lowest bias as well as variance of all models. The analysis of the explanatory variables indicate that behavioural attributes and variables, that concern interpurchase times in particular, were most significant of the target variables. Eventually, the application of calibration methods led to a decrease in forecasting performance rather than improving it. Conclusion: This research proposes a novel approach for short-term customer demand prediction within the online grocery retail market, which can provide an alternative to conventional time series forecasting techniques. The obtained results are satisfactory and of value for management and decision makers.