PK

P. Keer

info

Please Note

2 records found

A Discussion, and Exact Unconditional Tests for r×c Tables

Master thesis (2023) - P. Keer, H.P. Lopuhaä, Øyvind Bakke
Every time one counts the number of occurrences of a pair of values for two categorical variables, one obtains a contingency table. These tables are one of the simplest representations of data in order to statistically test for the presence of some association between the two variables under consideration. Although naturally occurring in so many scientific disciplines, there is still a lot of debate on the appropriate way to perform tests of significance on these contingency tables.

Especially when one wants to use exact methods, i.e., methods that are based on the exact probabilities of observing the table of interest, there is great disagreement on which marginal totals one should treat as fixed for inference. This has led to the development of the conditional tests, most famously Fisher's exact test, and unconditional tests, of which Barnard's CSM test was the first example. Mostly due to philosophical objections and computational challenges, the unconditional test has received far less attention over the years. This is especially true for contingency tables with more than 2 rows or columns. To our knowledge, there are no implementations available of exact unconditional tests for these larger tables.

The aim of this text is two-fold. First, we give a historical account on the rivalry between conditional and unconditional test, and argue that there is a case to be made to research exact unconditional methods in greater depth. Second, we will present implementations of exact unconditional tests that are applicable to general r×c contingency tables. Some of these implementations are generalisations of existing methods for the 2×2 table, such as Barnard's CSM test, with some additions in order to increase the computational efficiency. In addition, we also introduce a new approach that translates the classical Neyman-Pearson procedure of constructing a critical region for a given significance level α into a a mixed integer linear programming problem. The latter can be solved efficiently with one of many existing optimisation software packages.

This will eventually lead to a power study comparing 14 different tests, of which 12 unconditional ones, for different table dimensions and marginal totals. Although no test comes out as most powerful in every situation, the tests using a linear programming formulation have comparable, and often higher power than the classical unconditional approaches. This comes at a cost however, the critical regions produced via this optimisation approach are not guaranteed to be nested, i.e., they are not necessarily contained in each other for increasing values of α. This limits their use and interpretability. Further research should point out whether additional requirements can be formulated that would make the critical regions nested, while still keeping the advantages of the linear programming formulation. ...
Bachelor thesis (2021) - P. Keer, A. Cipriani, J.M. Thijssen, W.G.M. Groenevelt, A.R. Akhmerov, Alberto Chiarini
Level-set percolation on the Discrete Gaussian Free Field (DGFF) turned out to be a hot topic within mathematical physics over the last couple of years. In particular, the DGFF on Z^d , with homogeneously weighted nearest-neighbour interactions, i.e. all conductances equal to 1, has been studied in detail. These models can be simulated with great efficiency. In this research, we abandon the homogeneity requirement and look at three-dimensional DGFFs with arbitrary conductances. Our goal is to find a quick and reliable method to simulate such DGFFs on a finite lattice. Since this is, in essence, a high-dimensional Gaussian sampling problem, we investigated this problem using the Conjugate Gradients (CG) linear solver as a Gaussian sampler. To see how it performed, we compared our implementation of the CG sampler with known methods for DGFFs in the unit conductance case. Finally, as a showcase of our implementation, we studied level-set percolation on a DGFF with a simple checkerboard conductance pattern. Our main conclusion is that the CG algorithm is very suitable for simulating Discrete Gaussian Free Fields. Since it does not make any assumptions on the conductances, it can be used to generate DGFFs with arbitrary conductances. However, there are still a number of issues with our implementation. The biggest one is concerning the stopping tolerance of the CG sampler. Once the tolerance is set smaller than some lattice size-dependent threshold, the percolative behaviour of the resulting sample changes drastically. We have not been able to explain this. Moreover, we would recommend making the implementation usable for parallel computing. We have been limited to relatively small lattice sizes during this project. Consequently, the use of certain finite-size scaling arguments when analysing level-set percolation might not always have been as justified. Finally, based on our study of the DGFF on a lattice with checkerboard conductances a and b (a < b), we conjectured that, in its percolative behaviour, this DGFF resembles a DGFF defined on a lattice with constant conductance c, where c is a weighted average of a and b. The weight of a is expected to be larger than the weight of b. ...