Pricing of Non-Life Insurance Products

More Info


A medium size Dutch insurance company with third-party car insurance products initiated questions on whether the premium can be based on a statistical analysis where the expected future liabilities are taken into account. These questions are as follows:
• Which statistical models can be used to base the premiums on expected future liabilities?
• Are there enough data available to predict future liabilities accurately enough?
• How can the ’best’ model be chosen?
• How can the models be implemented?
• What are the results when using these models for the third-party car products?
After a practical introduction about insurances in society, the thesis starts with theory that can be used to answer the research questions. This analysis showed that generalized linear models are very useful models for the pricing of non-life insurance products. However, there are some disadvantages to these models which could be avoided by other models, such as hierarchical generalized linear models.

We will explore several methods to determine if enough data is available to obtain credible enough estimates. One of these methods can be applied before implementing a generalized linear model.

Choosing the ’best’ model is a non-trivial subject. Several statistical tests to choose which risk factors should be included in the model and how they should be included are discussed. These include tests for adding risk factors as random or fixed effects, but also which definition of an risk factor should best be used. This includes whether they should be added as a variate, as a factor or added dynamically. In addition, several statistical methods to choose the distribution that has the ’best’ fit for the observations for both the number of claims and the losses are discussed. These include graphical comparison methods, but also hypothesis testing.

To answer the question how the models can be implemented, we will use the statistical programming language R. Algorithms that are used by some packages to calculate the estimates of the models are discussed, as well as several features of these algorithms. Codes are provided in the supplementary section of the thesis.

Next, a statistical analysis is performed for the third-party car products of the insurance company. The performed theoretical analysis is applied in practice on the available data, and unknowns were calculated. Then an analysis is performed to determine which distribution ‘best’ fits the number of claims and which distribution ‘best’ fits the losses. The data was subdivided into several risk factors, such as age and region, and was analyzed again. A generalized linear model with a Poisson and log-link assumption was implemented for the number of claims, and a generalized linear model with a Gamma and log-link assumption was implemented for the losses. How and if the risk factors should be added was evaluated using a bottom-up approach. Initially, the models were applied without allowing interaction between risk factors, and subsequently the models were applied again, this time allowing interaction between risk factors to determine if this improved the models.

Other models that may lead to a better fit for the data are also implemented. These include generalized linear mixed models, which do not assume that the observations are independent and assume a Normal distribution for the risk factors that are added as random effects. Also, a pure premium model in which the Tweedie family is used was applied.

The study showed that the preferred models to calculate the pure premium are a generalized linear model with Negative Binomial and log-link assumption for the number of claims and a generalized linear model with Gamma and log-link assumption for the losses. Due to overdispersion of the observations for the number of claims, the Negative Binomial proved to be a better choice of distribution leading to a better fit of the model for the number of claims. The Normal and Pareto distribution were too symmetric and too right-skewed for the observations, respectively. The pure premium model showed a worse fit, when compared to the model for the number of claims. Furthermore, the effect of the risk factors on the risk profile of a risk group were very clear when a two-stage regression approach was used. The hierarchical models were not better models, because the estimates were less accurate.

The results for the different models were then compared with the currently used pricing system of the company and the expected outcomes of the data analysis. This leads to recommendations for the insurance company, including recommendations for pricing in general but also specific recommendations for the pricing system of the third-party insurance product.

The full thesis contains confidential information, therefore, a public version was provided in which the insurance company is anonymous. The full thesis was made available to the thesis committee.