Repository hosted by TU Delft Library

Home · Contact · About · Disclaimer ·

Combining an additive and tree-based regression model simultaneously: STIMA

Publication files not online:

Author: Dusseldorp, E. · Conversano, C. · Os, B.J. van
Institution: TNO Kwaliteit van Leven
Source:Journal of Computational and Graphical Statistics, 3, 19, 514-530
Identifier: 409257
Keywords: Health · Leefomgeving en gezondheid · Boston house price data · Interaction effects · Recursive partitioning · Threshold interactions


Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as CART, have problems capturing linear main effects of continuous predictors. To overcome these drawbacks, the regression trunk model has been proposed: a multiple regression model with main effects and a parsimonious amount of higher order interaction effects. The interaction effects can be represented by a small tree: a regression trunk. This article proposes a new algorithm-Simultaneous Threshold Interaction Modeling Algorithm (STIMA)-to estimate a regression trunk model that is more general and more efficient than the initial one (RTA) and is implemented in the R-package stima. Results from a simulation study show that the performance of STIMA is satisfactory for sample sizes of 200 or higher. For sample sizes of 300 or higher, the 0.50 SE rule is the best pruning rule for a regression trunk in terms of power and Type I error. For sample sizes of 200, the 0.80 SE rule is recommended. Results from a comparative study of eight regression methods applied to ten benchmark datasets suggest that STIMA and GUIDE are the best performers in terms of cross-validated prediction error. STIMA appeared to be the best method for datasets containing many categorical variables. The characteristics of a regression trunk model are illustrated using the Boston house price dataset. Supplemental materials for this article, including the R-package stima, are available online. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.