2D Skeleton-Based Medical Temporal Segmentation
The effect of limited supervision approaches in 2D skeleton based temporal segmentation of medical procedures
G. de Bakker (TU Delft - Mechanical Engineering)
J.J. van den Dobbelsteen – Mentor (TU Delft - Medical Instruments & Bio-Inspired Technology)
J.F.P. Kooij – Mentor (TU Delft - Intelligent Vehicles)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Temporal segmentation of medical procedures holds the potential to improve patient safety, provide decision support to clinicians, and serve as the basis for context-aware robotic assistance systems. However, clinical adoption remains hindered by two key challenges: the scarcity of annotated data and limited generalizability across diverse surgical settings. This thesis therefore explores 2D skeleton-based temporal segmentation as a privacy-preserving and data-efficient alternative to conventional RGB-based methods. Using the CAG-skeleton dataset, which consists of pose sequences extracted from external cardiac angiography (CAG) recordings, the study investigates various model architectures and limited supervision strategies for identifying 14 procedural phases.
A two-stage framework, combining a skeleton-based feature extractor with a temporal model, was adopted. A review and comparison of proven models revealed combinations of PR-GCN or MS-G3D feature extractors with LSTM or TCN temporal models to hold the most promise in the low-data medical domain. After training all combinations on low-data subsets of the CAG-skeleton dataset, it was found that all models outperformed a non-learning baseline model, which always predicts the mean procedure. Between the learning models, clip-wise segmentation accuracy differences held no statistical significance, but LSTM-based models showed a statistically significant superior understanding of sequential order. Considering both sequential metrics and computational efficiency, the PR-GCN + LSTM combination was selected for extensive evaluation, achieving a clip-wise segmentation accuracy of 83.95\% when trained on 146 CAG procedures.
To further address the data scarcity challenge, two limited supervision approaches were explored. Transfer learning using the Kinetics-skeleton dataset showed no statistically significant performance gains, suggesting that the knowledge learned from Kinetics-skeleton does not effectively transfer to the surgical domain, and/or that the information transferred is relatively easy for the model to learn from scratch during training. In contrast, pseudo-labeling via class-balanced self-training showed great potential for reducing annotation requirements as it provided consistent improvements to the models' clip-wise segmentation accuracy in the low data regime.
Overall, this study introduces skeleton-based representation as a modality holding large potential for medical temporal segmentation and highlights pseudo-labeling as an effective strategy for reducing annotation requirements.