Predicting Injury Risk with Machine Learning Methods using a Longitudinal Data Set

Master Thesis (2020)
Author(s)

L. Wu (TU Delft - Mechanical Engineering)

Contributor(s)

FCT van der Helm – Mentor (TU Delft - Biomechatronics & Human-Machine Control)

L. Gomaz – Mentor (TU Delft - Statistics)

D.M.J. Tax – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

H. E.J. Veeger – Graduation committee member (TU Delft - Biomechanical Engineering)

Faculty
Mechanical Engineering
Copyright
© 2020 Lian Wu
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 Lian Wu
Graduation Date
24-06-2020
Awarding Institution
Delft University of Technology
Project
CAS Project
Programme
Biomedical Engineering
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Sports injury has long been causing concern to the athlete’s performance, financial aspect, and psychological impact. This is even more prominent in recent years as sports become more widely available o the mass population. Reducing the chances of athletes experiencing injuries not allows them to maintain optimal performance during training and competition but is also psychological benefits for the athletes. Amsterdam UMC has gathered weekly OSTRC data from 19 Waterpolo athletes over 109 weeks. By using this data, it is possible to build a model that can provide an indication of possible injury risks thus helping athletes in controlling possible injuries. Traditionally, statistical models are used for this kind of analysis. However, this often requires large amount of time and a-priori knowledge. Hence an injury risk classifier method based on longitudinal data (OSTRC) and machine learning algorithm was proposed in this study. The chosen ML algorithms for this study are ANN, LSTM, and Random Forrest. To investigate the importance of time dependency between data entries, sliding window, and forward chaining are used during data processing. All models trained with minimally processed data, sliding window data, and forward chaining data to investigate the impact of time dependency on model accuracy. The OSTRC data set is pre-processed, and SW and FC methods are applied. Data is then split into training and testing data set. Metric used to assess model performances are accuracy and confusion matrix. The results of this study show that both RF-SW and LSTM-SW produced an accuracy of 92.17% and 91.67% respectively. Accuracy of ANN on training data indicates adequate performance, but the confusion matrix indicates poor performance. Confusion of RF-SW and LSTM-SW show small variation due to differences in the test data set, which can be mitigated as more data are available. Inconclusion, the high level of accuracy from both RF-SW and LSTM-SW proves that it can be used to provide insight to athletes, helping them to reduce the chances of injury.

Files

License info not available