J. Söhl
Please Note
34 records found
1
In a prediction tournament, each contestant is asked a number of questions about the probability that an event will occur before a specific date. Simulations indicate that contestants who perfectly predict these probabilities almost never win the tournament. This effect suggests that an accurate forecaster could increase her chance of winning by introducing some noise into her predictions. The aim of this report is to identify strategies that contestants can use to increase their chance of winning.
In this report, five strategies are introduced: hard-thresholding, soft-thresholding, polynomial strategy, exponential strategy, and random exponential strategy. Each strategy depends on a single parameter. For each strategy, simulations are performed under different settings to determine which strategy results in the most victories. To determine the best parameter for each strategy, polynomial regression is applied to the simulation data.
The simulations suggest that the exponential strategy has the largest positive impact on the number of wins for accurate contestants when all opponents use no additional strategies. If half of the opponents use an exponential or random exponential strategy, then the most accurate contestants are recommended to use a random exponential strategy. Using a strategy appears to have only a negative impact on a contestant’s chance of winning if all opponents use an exponential or random exponential strategy.
The best parameter for an exponential strategy appears to be smaller for less accurate forecasters. The least accurate contestants competing in a prediction tournament are recommended to use no strategy in all previously described situations. ...
In a prediction tournament, each contestant is asked a number of questions about the probability that an event will occur before a specific date. Simulations indicate that contestants who perfectly predict these probabilities almost never win the tournament. This effect suggests that an accurate forecaster could increase her chance of winning by introducing some noise into her predictions. The aim of this report is to identify strategies that contestants can use to increase their chance of winning.
In this report, five strategies are introduced: hard-thresholding, soft-thresholding, polynomial strategy, exponential strategy, and random exponential strategy. Each strategy depends on a single parameter. For each strategy, simulations are performed under different settings to determine which strategy results in the most victories. To determine the best parameter for each strategy, polynomial regression is applied to the simulation data.
The simulations suggest that the exponential strategy has the largest positive impact on the number of wins for accurate contestants when all opponents use no additional strategies. If half of the opponents use an exponential or random exponential strategy, then the most accurate contestants are recommended to use a random exponential strategy. Using a strategy appears to have only a negative impact on a contestant’s chance of winning if all opponents use an exponential or random exponential strategy.
The best parameter for an exponential strategy appears to be smaller for less accurate forecasters. The least accurate contestants competing in a prediction tournament are recommended to use no strategy in all previously described situations.
This thesis analyzes and compares four forecasting competition mechanisms: the standard deterministic mechanism, the Event Lotteries Forecasting Competition mechanism (ELF), the Independent Event Lotteries Forecasting mechanism (I-ELF), and the Wisdom of the Most Accurate Crowd mechanism (WOMAC). ELF and I-ELF add a amount of randomness in choosing the winner, which makes these mechanisms incentive compatible, although the forecaster with the highest score does not always win. The last mechanism which is introduced is WOMAC, this mechanism scores forecasters against a reference prediction made from other forecasters predictions, letting the forecaster with the highest score win and having Bayes-Nash incentive compatibility. The disadvantage of this mechanism is that there is a amount of randomness added by scoring forecasters against a reference prediction and not against the true probabilities. To select the best mechanism to use in a prediction tournament, simulations are made for comparisons. These simulations are made with the help of the point mass noise model for realistic forecasting errors. The mechanisms are evaluated on two criteria: the probability of selecting the most accurate forecaster and the degree of randomness introduced in winner selection, quantified using the expectation of the winner's rank. The results show that while ELF and I-ELF achieve strict dominant strategy incentive compatibility, both mechanisms introduce substantial randomness into winner selection, particularly when the accuracy gap between forecasters is small. The I-ELF mechanism was designed by Witkowski et al. (2021) to reduce this randomness as the number of events grows, and a lower bound on the required number of events is derived using Hoeffding's inequality. After conducting simulations in this thesis, it is found that for this bound an unrealistic high number of events is needed. These simulations confirmed that an unrealistically large number of events would be required to reduce randomness enough to guarantee a desired probability of the best forecaster winning. The WOMAC mechanism, which scores forecasters against a reference prediction constructed from the other forecasters rather than against the realized outcome, achieves Bayes-Nash incentive compatibility and consistently selects the best forecaster with higher probability and less randomness than ELF and I-ELF across all simulated settings.
The findings suggest that for organizations designing prediction tournaments under the given conditions, WOMAC represents the most practical choice, offering the best trade-off between incentive compatibility and reliable identification of the most accurate forecaster. ...
This thesis analyzes and compares four forecasting competition mechanisms: the standard deterministic mechanism, the Event Lotteries Forecasting Competition mechanism (ELF), the Independent Event Lotteries Forecasting mechanism (I-ELF), and the Wisdom of the Most Accurate Crowd mechanism (WOMAC). ELF and I-ELF add a amount of randomness in choosing the winner, which makes these mechanisms incentive compatible, although the forecaster with the highest score does not always win. The last mechanism which is introduced is WOMAC, this mechanism scores forecasters against a reference prediction made from other forecasters predictions, letting the forecaster with the highest score win and having Bayes-Nash incentive compatibility. The disadvantage of this mechanism is that there is a amount of randomness added by scoring forecasters against a reference prediction and not against the true probabilities. To select the best mechanism to use in a prediction tournament, simulations are made for comparisons. These simulations are made with the help of the point mass noise model for realistic forecasting errors. The mechanisms are evaluated on two criteria: the probability of selecting the most accurate forecaster and the degree of randomness introduced in winner selection, quantified using the expectation of the winner's rank. The results show that while ELF and I-ELF achieve strict dominant strategy incentive compatibility, both mechanisms introduce substantial randomness into winner selection, particularly when the accuracy gap between forecasters is small. The I-ELF mechanism was designed by Witkowski et al. (2021) to reduce this randomness as the number of events grows, and a lower bound on the required number of events is derived using Hoeffding's inequality. After conducting simulations in this thesis, it is found that for this bound an unrealistic high number of events is needed. These simulations confirmed that an unrealistically large number of events would be required to reduce randomness enough to guarantee a desired probability of the best forecaster winning. The WOMAC mechanism, which scores forecasters against a reference prediction constructed from the other forecasters rather than against the realized outcome, achieves Bayes-Nash incentive compatibility and consistently selects the best forecaster with higher probability and less randomness than ELF and I-ELF across all simulated settings.
The findings suggest that for organizations designing prediction tournaments under the given conditions, WOMAC represents the most practical choice, offering the best trade-off between incentive compatibility and reliable identification of the most accurate forecaster.
Two main contributions were made. First, a simulation framework was developed to construct predictive distributions for the rework LR. Starting from the deconvolution of the original profile, plausible contributor genotypes are sampled, additional replicate profiles are simulated, and the LR of the combined profile is calculated. Second, a Bayesian MCMC implementation was developed for the EuroForMix/DNAStatistX peak-height model, making it possible to propagate uncertainty in the nuisance parameters when computing LR values.
The framework was evaluated on cleaned two-person NFI research data, focusing on minor contributors. The frequentist plug-in simulation was not sufficiently calibrated: nominal 95% prediction intervals covered only 69.0% of the observed minor true-donor rework LRs. Including Bayesian parameter uncertainty improved the empirical coverage to 81.6% and reduced the mean interval score from 50.5 to 21.6. However, the predicted distributions remained insufficiently calibrated for casework use.
Overall, this thesis shows that predicting rework LRs is possible in principle and that parameter uncertainty is important for such predictions. The current framework should be viewed as a mathematical proof of concept rather than an operational tool. Further work is needed on artefact modelling, computational scaling, full MCMC validation, extension to more complex mixtures, and validation on casework-like data. ...
Two main contributions were made. First, a simulation framework was developed to construct predictive distributions for the rework LR. Starting from the deconvolution of the original profile, plausible contributor genotypes are sampled, additional replicate profiles are simulated, and the LR of the combined profile is calculated. Second, a Bayesian MCMC implementation was developed for the EuroForMix/DNAStatistX peak-height model, making it possible to propagate uncertainty in the nuisance parameters when computing LR values.
The framework was evaluated on cleaned two-person NFI research data, focusing on minor contributors. The frequentist plug-in simulation was not sufficiently calibrated: nominal 95% prediction intervals covered only 69.0% of the observed minor true-donor rework LRs. Including Bayesian parameter uncertainty improved the empirical coverage to 81.6% and reduced the mean interval score from 50.5 to 21.6. However, the predicted distributions remained insufficiently calibrated for casework use.
Overall, this thesis shows that predicting rework LRs is possible in principle and that parameter uncertainty is important for such predictions. The current framework should be viewed as a mathematical proof of concept rather than an operational tool. Further work is needed on artefact modelling, computational scaling, full MCMC validation, extension to more complex mixtures, and validation on casework-like data.
This thesis extends that pointwise normality result to a functional CLT in L²(K) (where K ⊂ ℝ is compact) for the estimated Lévy density. We first derive a candidate covariance kernel of the exponentially-tilted estimation error, and identify a central structural obstruction: the rescaled kernel converges to an oscillatory kernel that is not integrable in ℝ². Hence, the associated covariance operator is not nuclear in L²(ℝ), and a Giné-León Hilbert space CLT cannot be obtained. We therefore restrict the domain to a compact set and modulate the error by a cosine factor to remove the oscillations, which yields a Giné-León CLT for the linear part of the error. The bias and remainder terms vanish under appropriate scaling, yielding a functional Central Limit Theorem for the full cosine-modulated error.
As an application, this result is transferred through the Gil-Pelaez formula to obtain a convergence-in-distribution result for the pricing error of a digital call option where the error enters through estimation of the Lévy density. This enables the computation of finite-sample confidence intervals that bridge the gap between the theory and practice. Finally, possible extensions are discussed. ...
This thesis extends that pointwise normality result to a functional CLT in L²(K) (where K ⊂ ℝ is compact) for the estimated Lévy density. We first derive a candidate covariance kernel of the exponentially-tilted estimation error, and identify a central structural obstruction: the rescaled kernel converges to an oscillatory kernel that is not integrable in ℝ². Hence, the associated covariance operator is not nuclear in L²(ℝ), and a Giné-León Hilbert space CLT cannot be obtained. We therefore restrict the domain to a compact set and modulate the error by a cosine factor to remove the oscillations, which yields a Giné-León CLT for the linear part of the error. The bias and remainder terms vanish under appropriate scaling, yielding a functional Central Limit Theorem for the full cosine-modulated error.
As an application, this result is transferred through the Gil-Pelaez formula to obtain a convergence-in-distribution result for the pricing error of a digital call option where the error enters through estimation of the Lévy density. This enables the computation of finite-sample confidence intervals that bridge the gap between the theory and practice. Finally, possible extensions are discussed.
Forensic Evidence Interpretation Using Likelihood Ratios
A Study on Prior Probabilities and LR Distributions for DNA Donors
Two detailed case studies illustrate the impact of introducing new persons of interest (PoIs) and how prior knowledge about associations between individuals can alter posterior probabilities. A comparison is also drawn between categorical and probabilistic approaches in body fluid analysis, with the latter offering a more nuanced interpretation of mRNA profiling data.
In the second part, the thesis introduces methods to estimate LR distributions for DNA contributors. These include threshold-based and genotype sampling techniques, which are tested across synthetic mixtures with varying contributor ratios. Furthermore, the behavior of LRs is studied for relatives of the true donor.
The findings underscore the importance of transparently reporting assumptions about priors and the value of presenting LR tables to facilitate Bayesian reasoning by decision makers. Overall, the thesis contributes to a more robust and interpretable application of statistical reasoning in forensic science. ...
Two detailed case studies illustrate the impact of introducing new persons of interest (PoIs) and how prior knowledge about associations between individuals can alter posterior probabilities. A comparison is also drawn between categorical and probabilistic approaches in body fluid analysis, with the latter offering a more nuanced interpretation of mRNA profiling data.
In the second part, the thesis introduces methods to estimate LR distributions for DNA contributors. These include threshold-based and genotype sampling techniques, which are tested across synthetic mixtures with varying contributor ratios. Furthermore, the behavior of LRs is studied for relatives of the true donor.
The findings underscore the importance of transparently reporting assumptions about priors and the value of presenting LR tables to facilitate Bayesian reasoning by decision makers. Overall, the thesis contributes to a more robust and interpretable application of statistical reasoning in forensic science.
Deep Learning and Side-Channel Analysis
A Language Model-Inspired framework
Maskeringstechnieken gebruiken bij auteurherkenning
Optimale aantal te maskeren woorden vinden, rekeninghoudend met de prestaties en onderwerprobuustheid van het model
Firstly, the performance of state-of-the-art computational authorship attribution methods was assessed on Dutch, forensically relevant corpora. The compared methods were support vector machines combined with masking, using either word or character n-grams as features, BERT-based models using a mean pooling strategy to handle long texts and the baseline, which consists of a logistic regression model with the 100 most frequent Dutch words as features. We notice similar performance differences between state-of-the-art methods as in the literature. The best-performing method was a support vector machine without masking using character n-grams as features. In comparison, both the baseline and BERT-based models perform worse on our corpora.
Secondly, a score-based likelihood ratio system was created to modify the computational authorship attribution methods for usage in forensics. This method is based on kernel density estimators and uses cross-calibration to handle the small number of training and calibration texts of the suspect. For most methods, the performance is in line with the previous performances outside the likelihood ratio system, except for the BERT-based methods, which significantly underperform when part of a likelihood ratio system. This is likely caused by the combination of cross-calibration and the randomness in finetuning BERT models.
Additionally, authorship attribution methods should be topic-robust, such that their attribution is not biased by the topic of a text. We introduced two new metrics to measure the topic-robustness of authorship attribution methods, ‘topic impact’ and ‘conversation impact’. These metrics can only be used on specific types of corpora, the topic impact can be computed on topic-controlled corpora and the conversation impact can be computed on conversational corpora. To study whether these metrics both measured the topic-robustness of authorship attribution methods for their respective corpus type, we computed the correlation between the results of the metrics for varying authorship attribution methods.
We found a correlation of 0.68. As a result, we cannot conclude that the conversation impact is a perfect metric to measure the topic-robustness of methods using conversational corpora, but it does give a good indication of large differences between methods.
Using this new metric, we found that our best-performing methods suffered from a high conversation impact and, as a result, might be more likely to have a low topic-robustness. If more of the infrequent words were masked, the conversation impact decreased, but so did the performance. A trade-off between high performance and high topic-robustness must be made when a model is chosen for real forensic case work. The conversation impact metric we proposed can help quantify these effects on forensically relevant corpora and therefore assist in making better choices.
...
Firstly, the performance of state-of-the-art computational authorship attribution methods was assessed on Dutch, forensically relevant corpora. The compared methods were support vector machines combined with masking, using either word or character n-grams as features, BERT-based models using a mean pooling strategy to handle long texts and the baseline, which consists of a logistic regression model with the 100 most frequent Dutch words as features. We notice similar performance differences between state-of-the-art methods as in the literature. The best-performing method was a support vector machine without masking using character n-grams as features. In comparison, both the baseline and BERT-based models perform worse on our corpora.
Secondly, a score-based likelihood ratio system was created to modify the computational authorship attribution methods for usage in forensics. This method is based on kernel density estimators and uses cross-calibration to handle the small number of training and calibration texts of the suspect. For most methods, the performance is in line with the previous performances outside the likelihood ratio system, except for the BERT-based methods, which significantly underperform when part of a likelihood ratio system. This is likely caused by the combination of cross-calibration and the randomness in finetuning BERT models.
Additionally, authorship attribution methods should be topic-robust, such that their attribution is not biased by the topic of a text. We introduced two new metrics to measure the topic-robustness of authorship attribution methods, ‘topic impact’ and ‘conversation impact’. These metrics can only be used on specific types of corpora, the topic impact can be computed on topic-controlled corpora and the conversation impact can be computed on conversational corpora. To study whether these metrics both measured the topic-robustness of authorship attribution methods for their respective corpus type, we computed the correlation between the results of the metrics for varying authorship attribution methods.
We found a correlation of 0.68. As a result, we cannot conclude that the conversation impact is a perfect metric to measure the topic-robustness of methods using conversational corpora, but it does give a good indication of large differences between methods.
Using this new metric, we found that our best-performing methods suffered from a high conversation impact and, as a result, might be more likely to have a low topic-robustness. If more of the infrequent words were masked, the conversation impact decreased, but so did the performance. A trade-off between high performance and high topic-robustness must be made when a model is chosen for real forensic case work. The conversation impact metric we proposed can help quantify these effects on forensically relevant corpora and therefore assist in making better choices.
To achieve this, we implemented linear models, tree-based models, and time- series-based kNN models to forecast two specific KPIs one year in the future: SciSkill, which measures the general quality of a player, and Estimated Transfer Value, representing the player’s monetary value. Tree-based models showed the best predictive performance. The random forest in particular emerged as the best due to its explainable predictions, uncertainty quantification method based on bagging, and good predictive performance. In the Sciskill case study, the random forest model achieved low loss values, especially for young players. For the Estimated Transfer Value, the random forest model demonstrated the best predictive performance on the general set of players, and specifically on the subset of players valued at over €10 million.
Our findings suggest that tree-based models, particularly the random forest, are well-suited for predicting the future development of football player perfor- mance KPIs. Although it is important to monitor the predictive performance using the most recent data, the insights and the resulting models of this research can enhance scouting decisions via both data-informed and data-based decision- making. Finally, this research paves the way to study the influence of time series information or contextual information on player performance metrics. ...
To achieve this, we implemented linear models, tree-based models, and time- series-based kNN models to forecast two specific KPIs one year in the future: SciSkill, which measures the general quality of a player, and Estimated Transfer Value, representing the player’s monetary value. Tree-based models showed the best predictive performance. The random forest in particular emerged as the best due to its explainable predictions, uncertainty quantification method based on bagging, and good predictive performance. In the Sciskill case study, the random forest model achieved low loss values, especially for young players. For the Estimated Transfer Value, the random forest model demonstrated the best predictive performance on the general set of players, and specifically on the subset of players valued at over €10 million.
Our findings suggest that tree-based models, particularly the random forest, are well-suited for predicting the future development of football player perfor- mance KPIs. Although it is important to monitor the predictive performance using the most recent data, the insights and the resulting models of this research can enhance scouting decisions via both data-informed and data-based decision- making. Finally, this research paves the way to study the influence of time series information or contextual information on player performance metrics.
De relatie tussen de werkelijke zelfdiffusiviteit $D^\infty_{\text{self}}$ en de gemeten zelfdiffusiviteit $D^{\text{MS}}_{\text{self}}$ wordt beschreven met de formule:
$$D^\infty_{\text{self}} = D^{\text{MS}}_{\text{self}} + \frac{\xi k_B T}{6\pi \eta L}$$
waarbij $L$ de ribbelengte is en $\eta$ de viscositeit. Dit leidt tot een lineair regressiemodel waarbij $D^{\text{MS}}_{\text{self}}$ de afhankelijke variabele is en $\frac{1}{L}$ de onafhankelijke variabele is. De schatting van de intercept en de helling moet nauwkeurig worden bepaald.
Er is gebruik gemaakt van gewogen regressie om heteroscedasticiteit in de fouten van de simulaties aan te pakken. Metingen in grotere kubussen tellen zwaarder mee voor de regressie omdat ze nauwkeuriger zijn, terwijl metingen in kleinere kubussen minder meetellen. De variantiefunctie $\sigma(x)$ en de kostenfunctie $f(x)$ spelen een cruciale rol bij het bepalen van de optimale ribbelengtes en gewichten. In het bijzonder is aangetoond dat voor hogere ordes van de variantiefunctie (bijv. $\sigma(x) = x^3$) de voorkeur wordt gegeven aan grotere ribbelengtes.
Daarnaast zijn de simultane schattingen van zowel de viscositeit als de zelfdiffusiviteit verbeterd door gebruik te maken van de covariantiematrix van de regressie. De afhankelijkheid tussen de intercept en helling wordt gebruikt voor een verkleining van het betrouwbaarheidsgebied met 85\% ten opzichte van een traditionele benadering zonder covariantie. Dit is geïllustreerd met behulp van betrouwbaarheidsgebieden, waaruit bleek dat het meenemen van de covariantie bijdraagt aan het reduceren van de onzekerheid in de schattingen. ...
De relatie tussen de werkelijke zelfdiffusiviteit $D^\infty_{\text{self}}$ en de gemeten zelfdiffusiviteit $D^{\text{MS}}_{\text{self}}$ wordt beschreven met de formule:
$$D^\infty_{\text{self}} = D^{\text{MS}}_{\text{self}} + \frac{\xi k_B T}{6\pi \eta L}$$
waarbij $L$ de ribbelengte is en $\eta$ de viscositeit. Dit leidt tot een lineair regressiemodel waarbij $D^{\text{MS}}_{\text{self}}$ de afhankelijke variabele is en $\frac{1}{L}$ de onafhankelijke variabele is. De schatting van de intercept en de helling moet nauwkeurig worden bepaald.
Er is gebruik gemaakt van gewogen regressie om heteroscedasticiteit in de fouten van de simulaties aan te pakken. Metingen in grotere kubussen tellen zwaarder mee voor de regressie omdat ze nauwkeuriger zijn, terwijl metingen in kleinere kubussen minder meetellen. De variantiefunctie $\sigma(x)$ en de kostenfunctie $f(x)$ spelen een cruciale rol bij het bepalen van de optimale ribbelengtes en gewichten. In het bijzonder is aangetoond dat voor hogere ordes van de variantiefunctie (bijv. $\sigma(x) = x^3$) de voorkeur wordt gegeven aan grotere ribbelengtes.
Daarnaast zijn de simultane schattingen van zowel de viscositeit als de zelfdiffusiviteit verbeterd door gebruik te maken van de covariantiematrix van de regressie. De afhankelijkheid tussen de intercept en helling wordt gebruikt voor een verkleining van het betrouwbaarheidsgebied met 85\% ten opzichte van een traditionele benadering zonder covariantie. Dit is geïllustreerd met behulp van betrouwbaarheidsgebieden, waaruit bleek dat het meenemen van de covariantie bijdraagt aan het reduceren van de onzekerheid in de schattingen.
This thesis is written in the context of the Citius Altius Sanius (CAS) project aimed at injury prevention and performance improvement in sports. The CAS project combines the expertise of data scientists, industrial designers and biomechanical engineers together with the resources of sports associations and sports equipment designers among others. The goal of the CAS project is to initiate collaboration between various universities and departments to develop sensor technology, provide analysis based on the sensor data and provide a clear guideline of feedback to the athlete.
The primary goal of this thesis is to extract meaningful insights from sensor data through statistical modeling. Two sources of sensor data are used within the thesis: data from prototype sensor trousers worn by football players during training and data from a sensor sleeve worn by tennis players during serve practice. The research employs supervised learning algorithms within the framework of machine learning and deep learning models for capturing intricate patterns in the data as well as functional data analysis techniques such as functional principal components analysis and functional regression models applied for imputation purposes and dimension reduction.
We used neural network architecture, which mixes both convolutional and recurrent layers, consistently throughout this thesis. The main application of this network lies in recognizing football-related activities using sensor data. The neural network achieves good accuracy and is easily adaptable to other human activity recognition problems. We also considered various other models for this task, however none could match the computational speed and accuracy of the neural network. Nonetheless, given a plethora of methods that were tested and dissatisfaction with the accuracy measures used to assess the goodness-of-fit of the tested methods, a novel quality measure was introduced for activity recognition problems, to leverage the domain knowledge for the purpose of determining accuracy of an activity recognition method. In the case of our application, one of the constraints is the length of activities that are predicted. This measure accounts for the fact that activities such as jumping or passing a ball realistically have a minimum duration. Instances where a prediction model outputs an activity shorter than physically plausible incur harsh penalties.
We also propose a novel post-processing procedure tailored specifically to human activity recognition problems, ensuring that predictive models adhere to physical constraints, such as the minimum duration of activities. This post-processing method aims to increase the accuracy of prediction models which violate these constraints and as a result, to narrow the gap in accuracy between different prediction methods.
In the context of tennis, we encountered difficulties in predicting the serve performance metrics using sensor data. While predicting the ball speed can be easily achieved, accurately predicting the velocity-accuracy index (VA index), which combines ball speed with serve accuracy, proved more complex. To assess the effectiveness of our model in distinguishing true predictions from noise, we applied a permutation test. Notably, the main contribution of this research lies in the rigorous formulation of the null hypothesis for this test, linking it to established permutation test theory.
This research contributes to the fields of sports science and data analysis by offering insights into activity recognition and performance prediction using sensor data. The methodologies developed here have potential applications across various other sports as well as activities unrelated to sports. While data provided for purposes of this research comes from wearable sensors, it is possible to also apply these models and procedures in other types of sensor data or even beyond. ...
This thesis is written in the context of the Citius Altius Sanius (CAS) project aimed at injury prevention and performance improvement in sports. The CAS project combines the expertise of data scientists, industrial designers and biomechanical engineers together with the resources of sports associations and sports equipment designers among others. The goal of the CAS project is to initiate collaboration between various universities and departments to develop sensor technology, provide analysis based on the sensor data and provide a clear guideline of feedback to the athlete.
The primary goal of this thesis is to extract meaningful insights from sensor data through statistical modeling. Two sources of sensor data are used within the thesis: data from prototype sensor trousers worn by football players during training and data from a sensor sleeve worn by tennis players during serve practice. The research employs supervised learning algorithms within the framework of machine learning and deep learning models for capturing intricate patterns in the data as well as functional data analysis techniques such as functional principal components analysis and functional regression models applied for imputation purposes and dimension reduction.
We used neural network architecture, which mixes both convolutional and recurrent layers, consistently throughout this thesis. The main application of this network lies in recognizing football-related activities using sensor data. The neural network achieves good accuracy and is easily adaptable to other human activity recognition problems. We also considered various other models for this task, however none could match the computational speed and accuracy of the neural network. Nonetheless, given a plethora of methods that were tested and dissatisfaction with the accuracy measures used to assess the goodness-of-fit of the tested methods, a novel quality measure was introduced for activity recognition problems, to leverage the domain knowledge for the purpose of determining accuracy of an activity recognition method. In the case of our application, one of the constraints is the length of activities that are predicted. This measure accounts for the fact that activities such as jumping or passing a ball realistically have a minimum duration. Instances where a prediction model outputs an activity shorter than physically plausible incur harsh penalties.
We also propose a novel post-processing procedure tailored specifically to human activity recognition problems, ensuring that predictive models adhere to physical constraints, such as the minimum duration of activities. This post-processing method aims to increase the accuracy of prediction models which violate these constraints and as a result, to narrow the gap in accuracy between different prediction methods.
In the context of tennis, we encountered difficulties in predicting the serve performance metrics using sensor data. While predicting the ball speed can be easily achieved, accurately predicting the velocity-accuracy index (VA index), which combines ball speed with serve accuracy, proved more complex. To assess the effectiveness of our model in distinguishing true predictions from noise, we applied a permutation test. Notably, the main contribution of this research lies in the rigorous formulation of the null hypothesis for this test, linking it to established permutation test theory.
This research contributes to the fields of sports science and data analysis by offering insights into activity recognition and performance prediction using sensor data. The methodologies developed here have potential applications across various other sports as well as activities unrelated to sports. While data provided for purposes of this research comes from wearable sensors, it is possible to also apply these models and procedures in other types of sensor data or even beyond.
To gain this insight, synthetic data was created and used to make synthetic scans. The signal-to-noise ratio of a target spectrum was calculated, and Monte Carlo simulations were used to reveal hidden patterns in the data. In case of a high contrast scenario, multi-area whitening was employed and the cosine similarity between the target spectrum and its signature was determined. It was observed that the shape and intensity of the whitened target spectrum differs, depending on if pixels were used as observations or wavelengths. However, both are subject to the ‘bleeding’ effect. Further, it was found that if the number of pixels in the scan is greater than the number of spectral bands (548), then the signal-to-noise ratio becomes better as the number of whitened pixels in the scan increases. In case of a high contrast scenario, multi-area whitening guarantees the uniformity of the spectra, resulting in a higher
cosine similarity between the target spectrum and its signature. But as multi-area whitening uses a smaller
number of pixels in the scan, it cannot be concluded if multi-area whitening is better than global whitening, as it is not known how the increase in cosine similarity and the decrease in signal-to-noise ratio relate to the classification process. Finally, it is concluded that when working with real and unknown data, using pixels as
observations is much more feasible. ...
To gain this insight, synthetic data was created and used to make synthetic scans. The signal-to-noise ratio of a target spectrum was calculated, and Monte Carlo simulations were used to reveal hidden patterns in the data. In case of a high contrast scenario, multi-area whitening was employed and the cosine similarity between the target spectrum and its signature was determined. It was observed that the shape and intensity of the whitened target spectrum differs, depending on if pixels were used as observations or wavelengths. However, both are subject to the ‘bleeding’ effect. Further, it was found that if the number of pixels in the scan is greater than the number of spectral bands (548), then the signal-to-noise ratio becomes better as the number of whitened pixels in the scan increases. In case of a high contrast scenario, multi-area whitening guarantees the uniformity of the spectra, resulting in a higher
cosine similarity between the target spectrum and its signature. But as multi-area whitening uses a smaller
number of pixels in the scan, it cannot be concluded if multi-area whitening is better than global whitening, as it is not known how the increase in cosine similarity and the decrease in signal-to-noise ratio relate to the classification process. Finally, it is concluded that when working with real and unknown data, using pixels as
observations is much more feasible.
1
...
1
Football activity recognition
Improving and testing football activity recognition based on signal data using deep learning
Afterwards, the pipeline was used to evaluate larger datasets containing football drill and a football physiotherapy training. For this a sliding window evaluation procedure was proposed. These evaluations gave promising results. Many actions and football related activities could be recognized, however many smaller, shorter actions were missed. This can be seen as lack in trainingdata. In this data, little activities with the ball were present. Hence the deep learning models could not be trained accordingly. Later, it was researched
if additional training of activities with ball increased the evaluation. This was indeed confirmed, since the evaluations showed more detailed and realistic results. Including even more additional trainingdata, could result in the pipeline performing reliably in real-life football scenario’s. ...
Afterwards, the pipeline was used to evaluate larger datasets containing football drill and a football physiotherapy training. For this a sliding window evaluation procedure was proposed. These evaluations gave promising results. Many actions and football related activities could be recognized, however many smaller, shorter actions were missed. This can be seen as lack in trainingdata. In this data, little activities with the ball were present. Hence the deep learning models could not be trained accordingly. Later, it was researched
if additional training of activities with ball increased the evaluation. This was indeed confirmed, since the evaluations showed more detailed and realistic results. Including even more additional trainingdata, could result in the pipeline performing reliably in real-life football scenario’s.
Mathematics as a secret weapon against criminals
Employing score-based likelihood ratio systems for the comparison of handwriting and studying their quality of performance
To achieve this, we developed a novel model to describe the evolution of a vegetation index (such as RVI) during the growth season. Unlike existing models, the model presented in this thesis includes the effect of precipitation deficit, both as a temporary inhibitor of a vegetation index, and as a long-term influence on the crop growth. The model is non-linear in many of its model parameters. Therefore, heuristic calibration methods are unavoidable. We show that the standard calibration methods non-linear least squares and differential evolution are outperformed by a hybrid of both methods that we specifically designed for this application.
After calibrating the model to time series of 1167 potato parcels in the north-east of the Netherlands, we investigate different ways to cluster the model parameters. We propose explanations for three important clusterings through their RVI time series (speculative) environmental factors. Comparison with information on irrigated parcels for the years 2018-2020 reveals a statistically significant correlation between some of the clusters and irrigation. However, the variation in irrigation rate never exceeded a factor two. Therefore, no accurate classifier can be built based on these clusters.
We recommend two important ways to improve the current implementation. Firstly, the baseline RVI is consistently overestimated, resulting in mostly negative normalized RVI. Because of this, the model cannot properly describe precipitation deficit-driven fluctuations in the RVI. These fluctuations are an important part of system behaviour, so improving the estimation of the baseline RVI should be the first priority for future research.
Secondly, the exact irrigation dates of a set of parcels will be very useful. Comparing these dates to the corresponding RVI time series will make it possible to uncover features of the RVI evolution that are indicators of irrigation. The model parameterization can then be tuned to optimize sensitivity to these features. ...
To achieve this, we developed a novel model to describe the evolution of a vegetation index (such as RVI) during the growth season. Unlike existing models, the model presented in this thesis includes the effect of precipitation deficit, both as a temporary inhibitor of a vegetation index, and as a long-term influence on the crop growth. The model is non-linear in many of its model parameters. Therefore, heuristic calibration methods are unavoidable. We show that the standard calibration methods non-linear least squares and differential evolution are outperformed by a hybrid of both methods that we specifically designed for this application.
After calibrating the model to time series of 1167 potato parcels in the north-east of the Netherlands, we investigate different ways to cluster the model parameters. We propose explanations for three important clusterings through their RVI time series (speculative) environmental factors. Comparison with information on irrigated parcels for the years 2018-2020 reveals a statistically significant correlation between some of the clusters and irrigation. However, the variation in irrigation rate never exceeded a factor two. Therefore, no accurate classifier can be built based on these clusters.
We recommend two important ways to improve the current implementation. Firstly, the baseline RVI is consistently overestimated, resulting in mostly negative normalized RVI. Because of this, the model cannot properly describe precipitation deficit-driven fluctuations in the RVI. These fluctuations are an important part of system behaviour, so improving the estimation of the baseline RVI should be the first priority for future research.
Secondly, the exact irrigation dates of a set of parcels will be very useful. Comparing these dates to the corresponding RVI time series will make it possible to uncover features of the RVI evolution that are indicators of irrigation. The model parameterization can then be tuned to optimize sensitivity to these features.