Estimation of conditional CDFs using machine learning

More Info
expand_more

Abstract

This paper presents a novel approach for the estimation of conditional multivariate cumulative distribution functions (CDFs) within a nonparametric framework. To achieve this, we introduce a binary random variable that indirectly represents conditional CDFs and construct a dataset by pairing input vectors with the binary variables. We developed a general approach compatible with various machine learning methods.
We have also developed an R package that facilitates the application of machine learning methods. This package leverages a range of machine learning models, including decision trees, neural networks, random forests, and bagging neural networks. Through systematic learning of the intricate relationships between the covariates and the binary variables, we effectively estimate conditional CDFs.
To enhance the accuracy and reliability of the estimated CDFs, we incorporate a rearrangement technique which transforms the estimated functions into monotonic representations, aligning them more closely with the target CDFs and mitigating potential inconsistencies [6].
Through simulations, we evaluate the performance of the estimation approach under various scenarios and assess the impact of sample size and correlation on estimation accuracy, using Mean Integrated Squared Error as a key performance metric. The results demonstrate the effectiveness and robustness of the methodology in estimating conditional CDFs, providing a valuable tool for capturing complex dependencies in multivariate data, with potential applications in risk assessment, finance, and environmental modeling.