The Use of Data Science for Sports Analysis Purposes

More Info
expand_more

Abstract

This thesis is dedicated to the application of data science to sports data. The research for this thesis is part of a bigger project on injury prevention and sport performance called Citius Altius Sanius (CAS). Two data sets from two different projects within CAS are analysed, with two different goals; one focusses on sports injury prevention in soccer, the other on performance prediction in baseball. First we analyse a data set on acceleration during exercise from project P6, generated while testing a prototype of wearable sensor trousers during soccer drills. The aim of P6 is to design special leg wear with wearable sensors in order to gain more knowledge on hamstring injuries. Therefore, an algorithm needs to be developed to identify the intensity of certain movements using sensor data. Features were extracted from the acceleration data in order to classify the intensity. Four methods are then tested on the data, of which the decision tree seems to produce the best results. Analysis showed that this model seemed to be able to predict low intensity well (99.1% accuracy), although it struggles signicantly more with medium and high intensity exercise (75.5%). The second data set covered the growth in throwing speed of a group of young baseball athletes between the ages of 12 and 18. The aim of the research was to identify a common growth curve for throwing speed of pitchers during adolescence and provide personalised growth curve models. A mixed effects or multilevel design was chosen to model the growth in throwing speed, due to its ability to model the hierarchical nature of the longitudinal data. After analysing the data set and covariates, we found we could reduce the number of predictors, and thus the cost of collecting data. Furthermore, it is possible to predict throwing speed on a personal level using only age and one measurement on the predictors and throwing speed, although predictions are improved when more measurements are available. The results of this research can be implemented in the projects, although some complications and opportunities for improvement still exist. Recommendations for future research have therefore been discussed.