Predictive Analysis of Anti-NMDA-Receptor Encephalitis

using a Random Forest Classifier on EEG Data

More Info
expand_more

Abstract

During the initial phase of diagnosis, patients with anti-NDMA-receptor encephalitis (anti-NMDARE) often experience severe symptoms that significantly impact their quality of life. Anti-NDMARE is an autoimmune disorder affecting the brain, with electroencephalography (EEG) playing a vital role in diagnosis and treatment. Identifying EEG patterns associated with positive or negative prognosis is crucial for adjusting treatment intensity. Improved understanding of diagnosis, prognosis and treatment could enhance the quality of life for anti-NDMARE patients. This thesis aimed to analyse the EEG data with Machine Learning (ML) to predict which patients exhibit positive recovery after 12 months of standard treatment.

To predict the outcome after 12 months, a Random Forest (RF) classifier was constructed using available EEG features. The EEG dataset exhibited a clustered structure due to multiple values for each patient’s EEG features. Three approaches were considered to handle this clustering: ignoring clustering, reducing clustering to independent observations, and explicitly accounting for clustering. The first two options were explored in this research. Another prominent challenge encountered early in the research was the class imbalance, which was addressed by under- and oversampling the dataset.

For the simulation sets, under- or oversampling did not yield the desired effect, as the normal sets demonstrated comparable or even superior performance compared to the the under- and oversampled sets. However, under- and oversampling improved the performance scores for the real dataset. Reducing the clusters to independent observations did not achieve high performance scores compared to ignoring clustering, both in the simulation and real data cases. Furthermore, in both cases, RF models using the EEG sets outperformed those using principal component analysis (PCA) on the clustered EEG set.

Although the performance metrics scores were not yet optimal, important features for determining class labels were identified, providing a good understanding of the dataset. Mean Decrease in Impurity (MDI) and SHAP algorithm highlighted the significance of connectivity-related features in the reduced clustering to independent observation setting. The relevance of these features became evident upon calculating the mean, minimum, or maximum. In the EEG setting, MDI emphasized the importance of the features deltapower, sampleentropy and occipital-related features. These features remain important in the reduced set. SHAP, in addition to prioritizing the same features, offered insights into how specific features contribute to the prediction of a specific observation, enhancing interpretability.

The challenges for the RF classifier in the case of anti-NDMARE are class imbalance and accurate classification of the minority class. Under- and oversampling techniques successfully improved classification of minority class observations for the original EEG set. Concluding, this set is strongly encouraged to be utilized over all sets when aiming to classify EEG features. However, this set overlooks the clustering aspect, leaving room for optimization in future research to address this limitation. Additionally, it is recommended to explore the potential of a Convolutional Neural Network (CNN) for accurate classification of raw EEG signals. Its exploration was beyond the scope of this research.