Memory-aware optimization of mass-univariate statistical inference on EEG datasets

Accelerating the statistical testing pipeline of the Neurophysiological Biomarker Toolbox using memory-aware data layouts, vectorization, and native execution

Bachelor Thesis (2026)
Author(s)

P.O. van Egmond (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Arthur Ervin Avramiea – Mentor

Ricardo Guerra Marroquim – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
20-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
1
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper investigates memory-aware optimization of mass-univariate EEG statistical inference in the Neurophysiological Biomarker Toolbox. A vectorized Python implementation and a native Rust backend are evaluated as optimized alternatives to the existing NumPy/SciPy-based statistical testing pipeline. The optimized implementations reorganize EEG biomarker data for cohort-based access, improving support for cache locality, SIMD execution, and parallel processing. Synthetic benchmarks show speedups of up to 452.3x for the vectorized Python implementation and up to 486.1x for the Rust backend. The optimized implementations also substantially reduce sensitivity to increasing biomarker counts, resulting in much weaker runtime growth across the measured benchmark space. Profiling shows increased SIMD density and CPU utilization, while cache behaviour improves only modestly. These results suggest that the primary limitation is not the statistical operation itself, but the overhead introduced by how the workload is structured and executed. Much of the available speedup can therefore be achieved by expressing the computation as larger batched and vectorized operations.

Files

Paper.pdf
(pdf | 1.08 Mb)
License info not available