Accurate quantification of in vivo knee kinematics is essential for understanding joint health and disorders such as osteoarthritis. Optical motion capture (MoCap) is widely used in gait analysis, but its accuracy is compromised by soft tissue artefacts, limiting bone-level preci
...
Accurate quantification of in vivo knee kinematics is essential for understanding joint health and disorders such as osteoarthritis. Optical motion capture (MoCap) is widely used in gait analysis, but its accuracy is compromised by soft tissue artefacts, limiting bone-level precision. Fluoroscopy provides direct skeletal visualization, yet it faces several limitations, including the high sensitivity of 2D–3D registration to initial pose estimation and the reliance on manual bone segmentation. Moreover, the potential of combining MoCap with fluoroscopy, particularly the role of MoCap in initializing registration for fluoroscopic knee tracking during walking—remains underexplored. The aim of this study is to develop and evaluate a multimodal framework that integrates MoCap with single-plane fluoroscopy to improve automatic 2D–3D bone registration during dynamic gait analysis. Specifically, the framework addresses three challenges: (1) temporal synchronization and spatial calibration of the multimodal system, (2) MoCap-based initial pose estimation, and (3) deep learning–based automated femur segmentation. A synchronized acquisition setup was implemented, simultaneously recording 100 Hz MoCap data and 15 Hz fluoroscopic images during treadmill walking. Sub-frame temporal alignment was verified through a pendulum experiment. Spatial calibration was achieved using a rigid marker box and solved via the Perspective-n-Point algorithm, while a dynamic correction procedure was introduced to update camera pose across trials using rigid markers attached to the X-ray source. For initialization, anatomical reference cubes were constructed from both MoCap markers and segmented MRI data, and rigid registration between these cubes enabled transformation into the fluoroscopy frame. The resulting initial pose was refined using a two-stage optimization algorithm and evaluated against a reference initialization across four trials. MoCap-based initialization produced anatomically plausible in-plane alignment (mean error < 8 mm, < 16°), but consistently showed a systematic depth offset of 30–40 mm. To reduce reliance on manual annotation, a 2D nnU-Net segmentation model was trained on six manually annotated fluoroscopic images. Despite the limited dataset, the model demonstrated good anatomical plausibility, confirming its potential for automated workflows. In conclusion, this thesis establishes a multimodal framework that combines MoCap-based initialization, dynamic camera calibration, and deep learning–based segmentation. The evaluation demonstrates both feasibility and limitations, providing a reproducible basis for quantitative analysis of dynamic knee kinematics and opening new avenues for more automated and personalized assessment of joint disorders.