Deep Learning for Freezing of Gait Assessment using Inertial Measurement Units: A Multicentre Study
Po-Kai Yang (Katholieke Universiteit Leuven)
Juha Carlon (Katholieke Universiteit Leuven)
Maaike Goris (Katholieke Universiteit Leuven)
Emilie Klaver (Radboud University Medical Center)
Jorik Nonnekes (Radboud University Medical Center)
Richard J A van Wezel (Radboud Universiteit Nijmegen, University of Twente)
Lisa Alcock (Newcastle Upon Tyne Hospitals NHS Foundation Trust, Newcastle University)
Alison J Yarnall (Newcastle Upon Tyne Hospitals NHS Foundation Trust, Newcastle University)
Lynn Rochester (Newcastle Upon Tyne Hospitals NHS Foundation Trust, Newcastle University)
Clint Hansen (University Hospital Schleswig-Holstein)
Christian Schlenstedt (University Hospital Schleswig-Holstein)
Walter Maetzler (University Hospital Schleswig-Holstein)
David Buzaglo (Tel Aviv Sourasky Medical Center)
Marina Brozgol (Tel Aviv Sourasky Medical Center)
Jeffrey M Hausdorff (Rush Alzheimer's Disease Center, Tel Aviv Sourasky Medical Center, Tel Aviv University)
Alice Nieuwboer (Katholieke Universiteit Leuven)
Moran Gilat (Katholieke Universiteit Leuven)
Pieter Ginis (Katholieke Universiteit Leuven)
Bart Vanrumste (Katholieke Universiteit Leuven)
B. Filtjens (TU Delft - Transport and Logistics)
More Info
expand_more
Abstract
Video annotation is the gold-standard method to assess Freezing of Gait (FOG) in Parkinsonian disorders, but it is time-consuming. Deep learning (DL)-based assessment of FOG using inertial measurement units ameliorates these problems but poses challenges. Particularly, the large heterogeneity between patients and assessment methods potentially affects detection performance between independent cohorts. To evaluate heterogeneity effects, we developed a DL model on a local cohort (85 participants; 2043 trials) and validated it across six external cohorts (256 participants; 1058 trials). Model-expert agreement on the percentage-of-time-frozen was strong locally (ICC=0.886 [0.79,0.90]) but reduced in external cohorts (ICC=0.562±0.141). Fine-tuning the DL model with just 50 minutes of external cohort data improved the ICC to 0.732±0.138, falling within the borderline of the inter-rater agreement (ICC=0.73-0.99). Therefore, while unified standards are still being developed, we propose an expert-in-the-loop workflow as an effective intermediary and present a proof-of-concept web-based platform for fine-tuning and expert review (aidfog.be).