Prediction of Future Values of Player Performance KPIs in Football

Master Thesis (2024)
Author(s)

K.W. van Arem (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. Söhl – Mentor (TU Delft - Statistics)

Floris Goes-Smit – Graduation committee member

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
28-07-2024
Awarding Institution
Delft University of Technology
Programme
['Applied Mathematics']
Sponsors
None
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The introduction of data-based modeling in football (soccer) in the last decade has led to the creation of models that describe player performance through key performance indicators (KPIs). However, relying solely on historical and current KPI values is insufficient for scouting departments, as predicting future values could significantly enhance transfer decision-making. This research aimed to identify the optimal model for forecasting the development of player performance KPIs over the next year, focusing on explainability, uncertainty quantification, and predictive performance.
To achieve this, we implemented linear models, tree-based models, and time- series-based kNN models to forecast two specific KPIs one year in the future: SciSkill, which measures the general quality of a player, and Estimated Transfer Value, representing the player’s monetary value. Tree-based models showed the best predictive performance. The random forest in particular emerged as the best due to its explainable predictions, uncertainty quantification method based on bagging, and good predictive performance. In the Sciskill case study, the random forest model achieved low loss values, especially for young players. For the Estimated Transfer Value, the random forest model demonstrated the best predictive performance on the general set of players, and specifically on the subset of players valued at over €10 million.
Our findings suggest that tree-based models, particularly the random forest, are well-suited for predicting the future development of football player perfor- mance KPIs. Although it is important to monitor the predictive performance using the most recent data, the insights and the resulting models of this research can enhance scouting decisions via both data-informed and data-based decision- making. Finally, this research paves the way to study the influence of time series information or contextual information on player performance metrics.

Files

License info not available