High-dimensional Pearson's chi-squared test
hoge dimensionale Pearson chi-sqaured toets
C.I.M. van Wingerde (TU Delft - Electrical Engineering, Mathematics and Computer Science)
F. Mies – Mentor (TU Delft - Statistics)
D. Kurowicka – Graduation committee member (TU Delft - Applied Probability)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This paper revisits Pearson's chi-square test and studies its properties, highlighting the behavior of the test when applied to large supports, i.e., the number of cells versus the sample size. First, we explore the general behavior through a controlled simulation, wherein we find that the test exhibits an increased number of type I errors. These errors occur when the sample size is small relative to the number of cells. This behavior will be explained using a generalized central limit theorem, showing that the support needs to be $o\left(\sqrt{\frac{n}{\log{n}}} \right)$.