Circular Image

K.W. van Arem

info

Please Note

4 records found

Book chapter (2025) - K.W. van Arem, Jakob Söhl, Mirjam Bruinsma, Geurt Jongbloed
With an average football (soccer) match recording over 3,000 on-ball events, effective use of this event data is essential for practitioners at football clubs to obtain meaningful insights. Models can extract more information from this data, and explainable methods can make them more accessible to practitioners. The Expected Threat model has been praised for its explainability and offers an accessible option. However, selecting the grid size is a challenging key design choice that has to be made when applying the Expected Threat model. Using a finer grid leads to a more flexible model that can better distinguish between different situations, but the accuracy of the estimates deteriorates with a more flexible model. Consequently, practitioners face challenges in balancing the trade-off between model flexibility and model accuracy.
In this study, the Expected Threat model \added{is analyzed} from a theoretical perspective and simulations are performed based on the Markov chain of the model to examine its behavior in practice. Our theoretical results establish an upper bound on the error of the Expected Threat model for different flexibilities. Based on the simulations, a more accurate characterization of the model’s error is provided, improving over the theoretical bound. Finally, these insights are converted into a practical rule of thumb to help practitioners choose the right balance between the model flexibility and the desired accuracy of the Expected Threat model. ...
Journal article (2025) - K.W. van Arem, Floris Goes-Smit, J. Söhl
Featured Application: This paper studies what models are most suitable for forecasting future values of player performance metrics in association football (soccer). The resulting forecast statistics find applications in team management and player scouting at football clubs. As transfer decisions concern whether a player should play for a club in the future, the predictions of future performance metrics offer a forward-looking improvement over the traditional backward-looking assessments. Transfers in professional football (soccer) are risky investments because of the large transfer fees and high risks involved. Although data-driven models can be used to improve transfer decisions, existing models focus on describing players’ historical progress, leaving their future performance unknown. Moreover, recent developments have called for the use of explainable models combined with methods for uncertainty quantification of predictions to improve applicability for practitioners. This paper assesses explainable machine learning models in a practitioner-oriented way for the prediction of the future development in quality and transfer value of professional football players. To this end, the methods for uncertainty quantification are studied through the literature. The predictive accuracy is studied by training the models to predict the quality and value of players one year ahead, equivalent to one season. This is carried out by training them on two data sets containing data-driven indicators describing the player quality and player value in historical settings. In this paper, the random forest model is found to be the most suitable model because it provides accurate predictions as well as an uncertainty quantification method that naturally arises from the bagging procedure of the random forest model. Additionally, this research shows that the development of player performance contains nonlinear patterns and interactions between variables, and that time series information can provide useful information for the modeling of player performance metrics. The resulting models can help football clubs make more informed, data-driven transfer decisions by forecasting player quality and transfer value. ...
Preprint (2025) - Koen van Arem, Floris Goes-Smit, Jakob Söhl
Transfers in professional football (soccer) are risky investments because of the large transfer fees and high risks involved. Although data-driven models can be used to improve transfer decisions, existing models focus on describing players' historical progress, leaving their future performance unknown. Moreover, recent developments have called for the use of explainable models combined with uncertainty quantification of predictions. This paper assesses explainable machine learning models based on predictive accuracy and uncertainty quantification methods for the prediction of the future development in quality and transfer value of professional football players. Using a historical data set of data-driven indicators describing player quality and the transfer value of a football player, the models are trained to forecast player quality and player value one year ahead. These two prediction problems demonstrate the efficacy of tree-based models, particularly random forest and XGBoost, in making accurate predictions. In general, the random forest model is found to be the most suitable model because it provides accurate predictions as well as an uncertainty quantification method that naturally arises from the bagging procedure of the random forest model. Additionally, our research shows that the development of player performance contains nonlinear patterns and interactions between variables, and that time series information can provide useful information for the modeling of player performance metrics. Our research provides models to help football clubs make more informed, data-driven transfer decisions by forecasting player quality and transfer value. ...
Abstract (2024) - K.W. van Arem, Mirjam Bruinsma
In the last decade, systematic collection of data is increasingly being used in the world of football which enables the use of mathematics to improve the performance of individual players and teams. The choice for either a xThreat model or a VAEP-like model gives a trade-off between explainability and the ability to take in-game context into account. The main goal of this paper is to create an extended xThreat (expected threat) model that can include game context, thus aiming for explainability while taking into account contextual variables. ...