Psychiatric disorders are highly prevalent among adolescents, yet current diagnostic processes largely depend on time-intensive interviews and subjective assessments. Objective, scalable, and non-invasive tools are urgently needed to support early detection and monitoring. Speech
...
Psychiatric disorders are highly prevalent among adolescents, yet current diagnostic processes largely depend on time-intensive interviews and subjective assessments. Objective, scalable, and non-invasive tools are urgently needed to support early detection and monitoring. Speech acoustics represent a promising biomarker, as subtle variations in pitch, rhythm, and fluency may reflect underlying psychiatric symptoms.
This thesis investigates whether acoustic features of adolescent speech can be used to classify psychiatric symptoms through machine learning. The longitudinal iBerry study, which followed a large cohort of adolescents in the Rotterdam area, was used. Speech samples from semi-structured interviews were combined with self-reported questionnaires, including the Youth Self-Report (YSR) and the PQ-16 psychosis screener. Acoustic features were extracted using different established open-source toolkits and reduced via minimum redundancy maximum relevance (mRMR) selection. Three machine learning models—Random Forest, XGBoost, and Support Vector Machines—were trained using cross-validation, class weighting, and oversampling techniques to address class imbalance. Model performance was evaluated using F1-score, weighted accuracy, precision-recall curves and receiver operating characteristic (ROC) analyses. Fairness checks were performed to assess the influence of age and sex.
Results showed that predictive performance across most symptom domains was modest, with weighted accuracies generally only slightly above 50\% and F1-scores often low due to imbalanced data. Broader constructs, such as internalising and total problem scores, were somewhat better captured than narrow symptom subscales. These findings indicate that speech acoustics alone may be insufficient for reliable clinical classification, though they may still contribute as one element within a multimodal assessment strategy.
Beyond technical performance, this study highlights important methodological, ethical, and clinical considerations. Class imbalance, developmental changes in adolescent voices, and reliance on self-reported outcomes constrain model performance. At the same time, issues of bias, transparency, and privacy raise challenges for clinical implementation. Future research should focus on assembling larger and more diverse datasets, incorporating longitudinal monitoring, and integrating multimodal data sources such as linguistic or physiological measures. By addressing these challenges, speech-based models may eventually provide valuable tools to support scalable and objective psychiatric assessment.