Solving the inverse bone remodelling problem using reinforcement learning
A feasibility study
G.B. Schlief (TU Delft - Mechanical Engineering)
M.A. Sharifi Kolarijani – Mentor (TU Delft - Team Amin Sharifi Kolarijani)
N. Tümer – Mentor (TU Delft - Biomaterials & Tissue Biomechanics)
L. Laurenti – Graduation committee member (TU Delft - Team Luca Laurenti)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Bone dynamically adapts its strength to its environment in a process called bone remodelling. Mechanical stimuli are the primary predictors of future bone density, and many metrics exist to quantify their relationship with bone density changes. The inverse bone remodelling problem consists of finding the loading conditions that produce a bone density distribution, which is a non-unique, ill-posed problem. Understanding this process is relevant to implant design and osteoporosis prevention. Existing methods, like least-squares approaches, are computationally intensive as they iteratively explore the design space. This thesis explores anovel Reinforcement Learning (RL) framework as a possible alternative because of its effective exploration in inverse ill-posed problems.
Before focusing on the Machine Learning (ML) methods, we developed a dataset of steady-state density-load samples. Existing datasets did not meet the requirements of the RL framework for stable training. The forward model used for data generation was adapted to increase the input space, numerical stability and computational efficiency. The final model generated an extensive (100,000 samples), diverse (five loading types), high-quality dataset that the ML methods could rely on.
We used the dataset to train an Supervised Learning (SL) surrogate and an ensemble modelcapable of predicting the forward process (SSIM = 0.85 and SSIM = 0.87). The RL framework was designed by adapting the inverse problem into a sequential procedure. The surrogate model was used for reward estimation. This avoided learning instabilities created by the non-uniqueness of the problem. To validate the capabilities of the RL framework, we trained the model and a baseline on a simplified, order-reduced version of the problem. It outperformed the SL baseline model (SSIM 0.72 comparedto 0.28), demonstrating improved learning stability.
The RL framework, consisting of a sequential decision-making process and a forward estimation surrogate, smooths the reward domain, thereby stabilising learning. Furthermore, the dataset analysis showed the inverse problem relies on identifiable samples, while the forward problem instead needs dataset diversity. Limitations of the approach are the dependency chain between the surrogate and the framework, and the inability to find more than one plausible solution. Future work could explore multi-agent strategies that propose multiple solutions or improve reward discontinuities created by the current metric.