PedVision

A manual-annotation-free and age scalable segmentation pipeline for bone analysis in hand X-ray images

Journal Article (2026)
Author(s)

Morteza Homayounfar (TU Delft - Biomaterials & Tissue Biomechanics, Erasmus MC)

Sita M.A. Bierma-Zeinstra (Erasmus MC)

Amir Abbas Zadpoor (TU Delft - Biomaterials & Tissue Biomechanics)

Nazli Tümer (TU Delft - Biomaterials & Tissue Biomechanics)

Research Group
Biomaterials & Tissue Biomechanics
DOI related publication
https://doi.org/10.1016/j.bspc.2025.108569
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Biomaterials & Tissue Biomechanics
Volume number
112
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Medical image analysis often involves time-consuming annotation processes. Pediatric image analysis introduces additional complexity due to the scarcity of data, noise, and growth-related anatomical variations, particularly in bone analysis, where bone structures evolve more slowly compared to other organs. This study aims to develop a segmentation model that scales across different age groups, reduces annotation effort, and ensures high accuracy, particularly in low-quality images. To address these challenges, we propose a segmentation pipeline (PedVision) that first uses a Region of Interest (ROI) network to identify relevant regions, followed by a foundation model that translates each region into meaningful instances. These instances are then mapped to segmentation classes through an instance classifier (IC) network. To initiate rounds of the training of ROI and IC networks, we developed a fast, semi-automated annotation framework that leverages foundation models to annotate a subset of images using an object-level approach. In subsequent rounds, a human discriminator selects promising predictions from the last round, which are fed by unseen data, progressively enriching the model's training dataset for further fine-tuning of the networks. The networks are expanded from low-parameter to high-parameter models across rounds, incorporating a curriculum learning approach to capture increasingly complex features. We evaluated PedVision on 552 hand X-ray images of children, retrieved from the publicly available Radiological Society of North America (RSNA) and Digital Hand Atlas (DHA) datasets, which represent a diverse range of ages and racial backgrounds. PedVision performed segmentation of 19 hand bones, grouped in five classes, and was compared against U-Net and DeepLabV3+ models using ResNet34 and ResNet101 backbones, as well as the SegFormer model with four different encoder variants. For pediatric cases (i.e., 0–7 years), the PedVision pipeline outperforms the best-performing models, achieving an 11.08 % improvement in Dice score over U-Net in the RSNA dataset and a 7.68 % improvement in the DHA dataset. When compared to DeepLabV3+, the improvements are even more substantial, with gains of 14.43 % in RSNA and 14.78 % in DHA. Additionally, PedVision shows notable advantages over the best SegFormer model, with improvements of 8.16 % in RSNA and 1.91 % in DHA. The project is open source at github.com/mohofar/PedVision.