Introduction: In critically ill paediatric patients, sleep is essential for recovery and development, yet sleep disturbances are common in the paediatric intensive care unit (PICU), which highlight the need to integrate sleep monitoring into clinical practice. While automated sle
...
Introduction: In critically ill paediatric patients, sleep is essential for recovery and development, yet sleep disturbances are common in the paediatric intensive care unit (PICU), which highlight the need to integrate sleep monitoring into clinical practice. While automated sleep stage classification using machine learning (ML) on single-channel electroencephalography (EEG) data has shown promise in mostly healthy adult populations, its application to critically ill children is challenged by age-specific sleep architecture, medication effects, pathological conditions, and artefacts. This study evaluates whether deep learning (DL) feature extraction and dynamic models can improve sleep staging performance in this population and explores the use of an unsupervised ML model to gain deeper insight into the complex sleep structures in this population.
Methods: This study utilised EEG recordings from three datasets—healthy adults, non-critically ill children, and critically ill children—to train and evaluate supervised and unsupervised sleep stage classification models. As supervised models, a convolutional neural network (CNN) was used for feature extraction, followed by dynamic models including a long short-term memory (LSTM) network and a hidden Markov model (HMM) to account for temporal dependencies in sleep. Additionally, an unsupervised HMM was applied to explore underlying structures in the sleep EEG data without predefined labels.
Results: Supervised models achieved good performance in healthy adults and non-critically ill children, with maximum accuracies of 90.2\% and 77.4\%, respectively, for three-state classification. The added value of dynamic models over the CNN alone varied per dataset and model type and was not consistent. In critically ill children, classification performance was low, with a maximum accuracy of 61.4\%, and notably low macro-F1 and Cohen’s kappa scores (45.9\% and 26.5\%, respectively). The unsupervised HMM revealed that identifying distinct and stable clusters over time was challenging in all datasets. For critically ill children, the model often failed to identify multiple distinct clusters within individual patients, and substantial variability in cluster assignments was observed across patients.
Discussion: This study demonstrates that DL-based feature extraction and dynamic modelling using single-channel EEG can achieve strong sleep staging performance in healthy adults and non-critically ill children, highlighting the potential for (semi-)automated scoring tools in more stable populations. In contrast, performance in critically ill children was notably lower, likely due to factors such as high variability in sleep architecture, signal artefacts, limited data quality, and the uncertain reliability of manually assigned labels. These results suggest that conventional sleep stages do not generalise well to this population, and a purely data-driven, unsupervised approach does not offer a viable alternative. Overall, the findings emphasise the need for a larger dataset of critically ill children, further evaluation of relevant patient and data characteristics, the inclusion of alternative signals such as electrocardiography, and greater focus on model interpretability.