Gradient boosting for extreme quantile regression

Journal Article (2023)
Author(s)

J.J. Velthoen (TU Delft - Statistics)

Clément Dombry (Université de Bourgogne)

Juan Juan Cai (Vrije Universiteit Amsterdam)

Sebastian Engelke (Université de Genève)

Research Group
Statistics
Copyright
© 2023 J.J. Velthoen, Clément Dombry, Juan Juan Cai, Sebastian Engelke
DOI related publication
https://doi.org/10.1007/s10687-023-00473-x
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 J.J. Velthoen, Clément Dombry, Juan Juan Cai, Sebastian Engelke
Research Group
Statistics
Issue number
4
Volume number
26
Pages (from-to)
639-667
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Extreme quantile regression provides estimates of conditional quantiles outside the range of the data. Classical quantile regression performs poorly in such cases since data in the tail region are too scarce. Extreme value theory is used for extrapolation beyond the range of observed values and estimation of conditional extreme quantiles. Based on the peaks-over-threshold approach, the conditional distribution above a high threshold is approximated by a generalized Pareto distribution with covariate dependent parameters. We propose a gradient boosting procedure to estimate a conditional generalized Pareto distribution by minimizing its deviance. Cross-validation is used for the choice of tuning parameters such as the number of trees and the tree depths. We discuss diagnostic plots such as variable importance and partial dependence plots, which help to interpret the fitted models. In simulation studies we show that our gradient boosting procedure outperforms classical methods from quantile regression and extreme value theory, especially for high-dimensional predictor spaces and complex parameter response surfaces. An application to statistical post-processing of weather forecasts with precipitation data in the Netherlands is proposed.