A.J. Cabo | TU Delft Repository

Fitting statistical models for studying teachers' influence on students' math anxiety

Bachelor thesis (2026) - T.J.M. Schelling, A.J. Cabo, M.E. Kootte

Math anxiety, defined as fear or anxiety in math-related situations, is an increasing problem worldwide. This type of anxiety is experienced not only by students but also by adults, in situations such as paying with cash. This trend is concerning because individuals with high levels of math anxiety often avoid math-related activities, including educational and career opportunities. There is a shortage of professionals in technical fields, and this shortage will not decrease when more math anxiety is present.

Research into how to reduce the math anxiety of people begins at (high) school. Studies have confirmed that high school teachers have an influence on the increase or decrease of students' math anxiety. The key question is how this influence is expressed. Previous studies have investigated this question using a variety of statistical models. The purpose of this study is to determine which statistical model explains the most about the influence of teachers on students' math anxiety.

To investigate this question, data were collected at a high school in the Netherlands via questionnaires for teachers and students. The teacher survey contained the topics: mindset, instruction, and teacher emotions. Students completed questionnaires on their math anxiety and their perceived relationship with their teacher.

Subsequently, three models are constructed. The first model is a linear regression model where the average math anxiety of all students of one teacher is the response variable and the teacher characteristics are the explanatory variables. The second model uses the interpersonal relationship of a class with their teacher as explanatory variables and the average math anxiety per class as the response variable. This is a two-level mixed model. The last model is a three-level mixed-effects model and uses the interpersonal relationship of student and teacher as explanatory variables and the math anxiety of that student as the response variable. Three variants of the student-level model are examined: a model that has clusters on class and teacher levels. And two that only contain one cluster level, a two-level mixed model.

The models were compared using $R^2$ measures, the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC). The results indicate that the models based on teacher-level average anxiety and student-level anxiety with teacher clustering provided the best fit to the data. Future research should examine generalized models and validate the findings using larger datasets. ...

Math anxiety, defined as fear or anxiety in math-related situations, is an increasing problem worldwide. This type of anxiety is experienced not only by students but also by adults, in situations such as paying with cash. This trend is concerning because individuals with high levels of math anxiety often avoid math-related activities, including educational and career opportunities. There is a shortage of professionals in technical fields, and this shortage will not decrease when more math anxiety is present.

Research into how to reduce the math anxiety of people begins at (high) school. Studies have confirmed that high school teachers have an influence on the increase or decrease of students' math anxiety. The key question is how this influence is expressed. Previous studies have investigated this question using a variety of statistical models. The purpose of this study is to determine which statistical model explains the most about the influence of teachers on students' math anxiety.

To investigate this question, data were collected at a high school in the Netherlands via questionnaires for teachers and students. The teacher survey contained the topics: mindset, instruction, and teacher emotions. Students completed questionnaires on their math anxiety and their perceived relationship with their teacher.

Subsequently, three models are constructed. The first model is a linear regression model where the average math anxiety of all students of one teacher is the response variable and the teacher characteristics are the explanatory variables. The second model uses the interpersonal relationship of a class with their teacher as explanatory variables and the average math anxiety per class as the response variable. This is a two-level mixed model. The last model is a three-level mixed-effects model and uses the interpersonal relationship of student and teacher as explanatory variables and the math anxiety of that student as the response variable. Three variants of the student-level model are examined: a model that has clusters on class and teacher levels. And two that only contain one cluster level, a two-level mixed model.

The models were compared using $R^2$ measures, the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC). The results indicate that the models based on teacher-level average anxiety and student-level anxiety with teacher clustering provided the best fit to the data. Future research should examine generalized models and validate the findings using larger datasets.

Bayesian Estimation of Multilevel Structural Equation Models: Prior Specification for Random Effects Variances

Bachelor thesis (2025) - J.R. Vonk, A.J. Cabo, H.M. Schuttelaars

Recurrence Quantification Analysis explained and applied to Educational Science

Bachelor thesis (2024) - B.M.L. Hogervorst, A.J. Cabo, E. Papageorgiou

Student Engagement (SE) is a critical factor when researching student performance. This thesis intends to research the correlation between SE and performance through the lens of Complex Dynamical System (CDS) theory. Recognizing that SE is influenced by multiple interdependent variables that cause non-linear and not fully predictable behavior, SE is analyzed through Recurrence Quantification Analysis (RQA). RQA uses the distance between time series data points to visualize and quantify the dynamic characteristics of a CDS, such as repetition, periodicity and predictability. In this thesis, Time Spent, Attempts Made and On-Time Rate are used as indicators of SE to examine if correlation exists between student performance and recurrence regarding SE. To analyze if there does exist a relationship between these aspects, a dataset of 144 civil and mechanical engineering students following a Linear Algebra course at TU Delft was used. Recurrence in SE was quantified using the following RQA variables: Recurrence Rate $(RR)$, Determinism $(DET)$, Average Diagonal Line Length $(Davg)$, Trapping Time $(TT)$,and Shannon's Entropy of diagonal line lengths $(ENTR)$. However, eventually $ENTR$ was not included as it did not provide interpretable information regarding SE. The findings suggest that high-performing students show more engagement overall and less recurrence in some SE indicators, namely Time Spent per study session and Attempts Made per study session. These insights suggest that students should engage with online study materials as much as possible in study sessions that vary in length and exercises attempted to perform optimally. This thesis concludes with recommendations for further research analyzing the effects between recurrence in SE and student performance. ...

Applying Structural Equation Modelling on a Motivation Survey

Master thesis (2023) - L.M. Leeuwestein, A.J. Cabo

In this research, Structural Equation Modelling was applied to analyze the relationships between latent variables measured by a Motivation Survey. Special attention was paid to the ordinal nature of the data. The model was split into several parts, containing Teacher Cues Monitoring and Scaffolding, Satisfaction and Frustration of the three needs Autonomy, Competence and Relatedness, and Autonomous and Controlled Motivation. The aim of this research was to find a suitable model and to discover how all of the latent variables mentioned above influence academic performance.
It turns out that Academic Performance was influenced most by Autonomous Motivation, which in turn was influenced most by Autonomy Satisfaction. When leaving out Autonomy Satisfaction, Relatedness and Competence Satisfaction also appear to have a positive influence on Autonomous Motivation, which in turn are positively influenced by Monitoring and especially Scaffolding. The results have implications for education purposes, in the sense that it gives some insight with respect to teachers’ involvement in students’ academic performance.
...

A comparison of the frequentist and Bayesian approach to multinomial logistic regression in statistics: an application to study habits data from PRIME

Bachelor thesis (2023) - S.H. Schriemer, A.J. Cabo, T.W.C. Vroegrijk

Frequentist statistics and Bayesian statistics are the two main approaches to statistical inference. The frequentist approach is commonly integrated into academic curricula, while the Bayesian approach is less frequently employed. However a comparison of the approaches, further investigating their shortcomings and advantages, might give a better comprehension of statistics and more insight in statistical inference. Therefore the current study applied both the frequentist and Bayesian approach to multinomial logistic regression.

The multinomial logistic regression model can be described as a generalized linear model and as a random utility model, and the current study has shown that these models generate an equivalent probability function. Moreover, the method of estimating coefficients in the frequentist and in the Bayesian approach were described. The multinomial logistic regression model was subsequently applied to data from educational research, conducted by PRIME. Three different R packages were used to perform the multinomial logistic regression: the VGAM package (frequentist, generalized linear model), the mlogit package (frequentist, random utility model) and the UPG package (Bayesian, random utility model). The results of the analysis of one dependent variable were subsequently compared.

The results indicated that the frequentist and Bayesian approach differ in their estimation time and model fit: the Bayesian approach required more computational time, but resulted in a better model fit. The frequentist 95% confidence intervals and Bayesian 95% credible intervals are comparable, but the interpretation of these is considerably different due to the philosophical underpinnings of both approaches. Moreover in the Bayesian approach, existing knowledge and information can be incorporated by choosing the prior distribution. Furthermore the Bayesian approach gives a posterior distribution, which is more informative than only a point estimate. In comparing the three different R packages it is noted that all three have a slightly different theoretical background. Since the packages all have their own shortcomings and advantages, combining them when conducting multinomial logistic regression could be desirable.
...

Frequentist statistics and Bayesian statistics are the two main approaches to statistical inference. The frequentist approach is commonly integrated into academic curricula, while the Bayesian approach is less frequently employed. However a comparison of the approaches, further investigating their shortcomings and advantages, might give a better comprehension of statistics and more insight in statistical inference. Therefore the current study applied both the frequentist and Bayesian approach to multinomial logistic regression.

The multinomial logistic regression model can be described as a generalized linear model and as a random utility model, and the current study has shown that these models generate an equivalent probability function. Moreover, the method of estimating coefficients in the frequentist and in the Bayesian approach were described. The multinomial logistic regression model was subsequently applied to data from educational research, conducted by PRIME. Three different R packages were used to perform the multinomial logistic regression: the VGAM package (frequentist, generalized linear model), the mlogit package (frequentist, random utility model) and the UPG package (Bayesian, random utility model). The results of the analysis of one dependent variable were subsequently compared.

The results indicated that the frequentist and Bayesian approach differ in their estimation time and model fit: the Bayesian approach required more computational time, but resulted in a better model fit. The frequentist 95% confidence intervals and Bayesian 95% credible intervals are comparable, but the interpretation of these is considerably different due to the philosophical underpinnings of both approaches. Moreover in the Bayesian approach, existing knowledge and information can be incorporated by choosing the prior distribution. Furthermore the Bayesian approach gives a posterior distribution, which is more informative than only a point estimate. In comparing the three different R packages it is noted that all three have a slightly different theoretical background. Since the packages all have their own shortcomings and advantages, combining them when conducting multinomial logistic regression could be desirable.

Evaluating the effect of emotions on academic achievement in mathematics

A quantile regression and Bayesian quantile regression approach

Bachelor thesis (2023) - B.D.W. Janssen, A.J. Cabo

Quantile regression is a useful method to analyse data such that the estimates are more robust to outliers and the conditional distributions are more reliable for asymmetric distributions with respect to the commonly used ordinary least squares regression. Besides this, the quantile regression analysis might also include extra information on the conditional relations between the response variable and the explanatory variables. Therefore, it is previously used in educational sciences, among many other research areas. Bayesian statistics is an upcoming approach for computing estimates, as it allows prior knowledge modelling. The Bayesian quantile regression approach produces accurate parameter estimates by specifying prior distributions, likelihood estimators and MCMC methods to model an informative posterior distribution. In this research, the theory behind the quantile regression and the Bayesian quantile regression approach are considered. Especially Bayesian quantile regression for ordinal longitudinal data. This theory is then used on data of academic emotions to analyse its effect on attained grades of engineering students. Multiple aspects of quantile regression are included to analyse this effect, regarding gender, time and correlations between academic emotions. It was found that the quantile regression produced insights that were ignored by ordinary least squares regression, as the effects of anxiety altered over different quantiles. Especially when seperating genders, the effect of anxiety seemed to differ a lot between genders and different fractions of the response variable. Furthermore, an assumption for Bayesian quantile regression is made by specifying an exponentially distributed prior and by seperating the gender distributions, as was found by the estimates of the quantile regression approach. ...

Employing latent profile analysis to identify student motivational profiles

Bachelor thesis (2022) - P. Huisman, A.J. Cabo, L.Y.J. Wong

Latent profile analysis is a statistical modeling approach used to identify hidden subpopulations (i.e., latent profiles) within a population. These latent profiles are identified based on values of observed continuous variables, also known as profile indicators. While LPA is getting more popular in education sciences and psychology to group people based on similar characteristics, very little is known about the mathematical formulation. In this thesis, the mathematical foundations of LPA is introduced and explained. This leads to a discussion on the assumptions for the model.
After investigating the mathematical foundations of LPA, we applied LPA to identify different profiles of motivation in a student population at Delft University of Technology. We used a set of survey data measuring four types of motivation (i.e., profile indicators). Results of the analysis showed that there are four different student motivational profiles, each consisting of a different combination of the four types of motivation. ...

Bayesian Structural Equation Modeling

Explained and Applied to Educational Science

Bachelor thesis (2021) - A.C. Brouwer, A.J. Cabo, N.J. van der Wal

Structural equation modeling (SEM) is frequently used in social sciences to analyze relations among observed and latent variables and test theoretical propositions regarding relations among these latent variables. Frequentist SEM relies on Maximum Likelihood Estimation, and although this method works well for many simple situations, its performance is unsatisfactory when dealing with complex models or small sample sizes. In search of a method that resolves those problems, Bayesian SEM has been developed recently. These models produce more accurate parameter estimates. The Bayesian approach to SEM offers the possibility of incorporating prior knowledge into SEM, allowing for model extension and improvement. In this research, the theory of both frequentist and Bayesian SEM is described. Subsequently, Bayesian SEM is illustrated with an application in educational sciences. A method is proposed to specify prior distributions that use correlation estimates found in previous research to reflect prior information and our confidence in that information. The results obtained by an informative prior model are analyzed and compared to the results of a noninformative, weakly informative, and frequentist model. It was found that the informative prior model produces more accurate estimates than the noninformative and weakly informative prior models, indicating the correctness of the specified priors. ...

Estimating links between latent variables using Structural Equation Modeling (SEM) in R

Bachelor thesis (2020) - V.R. Plomp, A.J. Cabo

Structural equation modeling is a statistical analysis technique used to analyse structural relationships between observed variables and unobserved latent variables, and can be used to estimate links between latent variables. The technique is most commonly used in the field of psychology, and is not well known in the field of mathematics. Literature on structural equation modeling often lacks mathematical formulation. In this work, the mathematical theory of the method is discussed and where needed mathematical formulation is introduced. This is done by discussing the five steps in which the method can be summarized. After analysing the mathematical theory of the method, we illustrate the method by applying it to collected data, and we interpret the results. ...