Credit scoring for small medium enterprises using transaction data

More Info


Managing credit risk is a vital part of financial institutions. While the research into credit risk models is extensive, transaction data is a relatively untapped data source in these models. We investigate the explanatory value of transaction data for the Bank by developing default classification models for their small medium enterprises (SME) portfolio. We develop measures that summarize the transaction behaviour on a client level for different time windows. Variables that are included into traditional models are positive income shocks, balance returns, zero transactions (indicating rejected direct debits), and relative cash
expenditure. By combining these variables with client characteristics and loan behaviour information, we develop a hierarchical logistic regression model which has a good overall classification performance, reflected by an area under curve (AUC) of 0.850. Tolerating 2 out of 3 false warnings, the model identifies more than 50% of the defaults on average. We investigate relational classification methods, which classify clients according to similarity in terms of their transaction behaviour. The relational neighbour classifier achieves an AUC
of 0.768, using similarity between to clients that are determined according to a flexible weight function of the number of shared entities. By combining this approach with the aggregated transaction variables, we develop a model which is solely based on transaction data. The strong performance of this model is reflected by an AUC of 0.804, illustrating the effectiveness of transaction data in default classification.