G. van Tulder
Please Note
17 records found
1
Optimising Labeling
The limits of weakly supervised osteophytes severity grading and localization in Hip X-Rays
teoarthritis (OA), but grading their severity in specific hip locations is a time consum-
ing process that requires an expert. In many cases it is expensive to scale datasets
with location annotated severity labelling by experts, where as weak labels, containing
only the global presence of osteophytes is much easier to attain. This paper investi-
gates whether such weak global label can improve localized severity grading through a
multitask deep learning framework.
We study a ResNet-18 based convolutional network that shares and updates its
weights across two output heads, a global binary classification head and four regional
ordinal heads for femur superior, femur inferior, acetabulum superior and acetabulum
inferior. The model is trained under four supervision strategies: a strong-only config-
uration using only quadrant-level labels, a masked baseline that incorporates weakly
labelled negatives via label propagation and ignores weak positives in the local loss,
and two Multi-Instance Learning variants that use a Noisy-OR loss to propagate weak
positive labels to the quadrants. We systematically vary the ratio of weak to strong la-
bels and evaluate performance using quadratic weighted Cohen’s kappa as the primary
metric.
Experiments show that the masked baseline with weak labels improves regional
kappa score compared to the strong-only configuration, while MIL variants fail to out-
perform the baseline and can degrade performance at higher weak-to-strong ratios. We
further observe that selecting checkpoints by minimal joint validation loss underesti-
mates achievable kappa score, due to faster convergence of the global task, whereas
selecting by maximal kappa score yields substantially better localized grading. Overall
the findings highlight the trade off between localization and classification performance
in weakly supervised multitask learning pipelines for regional osteophytes grading in
hip X-Rays. ...
teoarthritis (OA), but grading their severity in specific hip locations is a time consum-
ing process that requires an expert. In many cases it is expensive to scale datasets
with location annotated severity labelling by experts, where as weak labels, containing
only the global presence of osteophytes is much easier to attain. This paper investi-
gates whether such weak global label can improve localized severity grading through a
multitask deep learning framework.
We study a ResNet-18 based convolutional network that shares and updates its
weights across two output heads, a global binary classification head and four regional
ordinal heads for femur superior, femur inferior, acetabulum superior and acetabulum
inferior. The model is trained under four supervision strategies: a strong-only config-
uration using only quadrant-level labels, a masked baseline that incorporates weakly
labelled negatives via label propagation and ignores weak positives in the local loss,
and two Multi-Instance Learning variants that use a Noisy-OR loss to propagate weak
positive labels to the quadrants. We systematically vary the ratio of weak to strong la-
bels and evaluate performance using quadratic weighted Cohen’s kappa as the primary
metric.
Experiments show that the masked baseline with weak labels improves regional
kappa score compared to the strong-only configuration, while MIL variants fail to out-
perform the baseline and can degrade performance at higher weak-to-strong ratios. We
further observe that selecting checkpoints by minimal joint validation loss underesti-
mates achievable kappa score, due to faster convergence of the global task, whereas
selecting by maximal kappa score yields substantially better localized grading. Overall
the findings highlight the trade off between localization and classification performance
in weakly supervised multitask learning pipelines for regional osteophytes grading in
hip X-Rays.
Annotation-Efficient Osteophyte Severity Estimation in Hip X-rays
Combining Binary Presence Labels with Limited OARSI Grade Supervision
osteoarthritis, is expensive because it requires expert annotation, whereas coarser binary presence labels are far easier to obtain. This study investigates how effectively
these binary labels can be combined with a limited number of graded labels to estimate ordinal osteophyte severity in hip X-ray crops, and whether the choice of which samples to grade matters. We formulate the task as cumulative ordinal regression over four anatomical locations per hip, in which binary labels supervise the presence threshold and graded labels supervise the higher severity thresholds, while thresholds with no available grade are left unsupervised. A binary-only baseline detected osteophyte presence well and produced confidence scores that rose with true grade, but could not resolve the higher grades. A few graded labels enabled ordinal expected-severity estimates and reduced macro-averaged mean absolute error, with the largest gains at the smallest budgets and diminishing returns beyond. Comparing score-stratified sampling against random selection of the graded subset, the score-based strategy was competitive but not consistently better, indicating that most of the benefit comes from adding graded supervision rather than from how the samples are chosen. All results are reported on a held-out test set, averaged over three seeds. Combining many binary labels with relatively few graded labels is a promising way to reduce expert annotation burden while still producing useful ordinal severity estimates. ...
osteoarthritis, is expensive because it requires expert annotation, whereas coarser binary presence labels are far easier to obtain. This study investigates how effectively
these binary labels can be combined with a limited number of graded labels to estimate ordinal osteophyte severity in hip X-ray crops, and whether the choice of which samples to grade matters. We formulate the task as cumulative ordinal regression over four anatomical locations per hip, in which binary labels supervise the presence threshold and graded labels supervise the higher severity thresholds, while thresholds with no available grade are left unsupervised. A binary-only baseline detected osteophyte presence well and produced confidence scores that rose with true grade, but could not resolve the higher grades. A few graded labels enabled ordinal expected-severity estimates and reduced macro-averaged mean absolute error, with the largest gains at the smallest budgets and diminishing returns beyond. Comparing score-stratified sampling against random selection of the graded subset, the score-based strategy was competitive but not consistently better, indicating that most of the benefit comes from adding graded supervision rather than from how the samples are chosen. All results are reported on a held-out test set, averaged over three seeds. Combining many binary labels with relatively few graded labels is a promising way to reduce expert annotation burden while still producing useful ordinal severity estimates.
Anatomical Priors for Weakly Supervised Osteophyte Detection and Localization in Hip X-rays
Evaluating BoneFinder-Derived Guidance Under Image-Level Supervision
This work investigates whether anatomical priors derived from landmark points can improve weakly supervised osteophyte detection and localization in hip X-rays when only image-level labels are available. We propose modified ResNet-18 architectures that integrate anatomical guidance to highlight likely osteophyte regions.
We evaluate the proposed models across varying training data sizes. The results show that models with anatomical guidance generally outperform baseline models, with the most consistent improvements observed in classification metrics, while localization results are less conclusive. Additionally, experiments performed without guidance during testing led to reduced classification performance. Overall, the results suggest that anatomical priors provide useful complementary information for weakly supervised osteophyte detection, although they do not fully compensate for limited training data. Moreover, the benefit of guidance information varies across architectures and training set sizes. ...
This work investigates whether anatomical priors derived from landmark points can improve weakly supervised osteophyte detection and localization in hip X-rays when only image-level labels are available. We propose modified ResNet-18 architectures that integrate anatomical guidance to highlight likely osteophyte regions.
We evaluate the proposed models across varying training data sizes. The results show that models with anatomical guidance generally outperform baseline models, with the most consistent improvements observed in classification metrics, while localization results are less conclusive. Additionally, experiments performed without guidance during testing led to reduced classification performance. Overall, the results suggest that anatomical priors provide useful complementary information for weakly supervised osteophyte detection, although they do not fully compensate for limited training data. Moreover, the benefit of guidance information varies across architectures and training set sizes.
Landmark-Based Anatomical Priors as Penalty Masks in Weakly Supervised Learning
Effects on Classification Performance and Heatmap Distribution in Hip Osteophyte Detection
We develop a framework combining input-distribution diagnostics, label-distribution analysis, and bidirectional cross-domain model evaluation to assess whether observed differences are consistent with annotation shift. The approach is evaluated through controlled synthetic experiments and experiments using osteoarthritis radiographs.
Across both settings, annotation shift produces characteristic directional asymmetries in cross-domain prediction errors that differ from the signatures of prevalence and acquisition shifts. These asymmetries provide a basis for distinguishing annotation shift from other forms of domain shift, enabling more reliable interpretation of cross-domain model failures. ...
We develop a framework combining input-distribution diagnostics, label-distribution analysis, and bidirectional cross-domain model evaluation to assess whether observed differences are consistent with annotation shift. The approach is evaluated through controlled synthetic experiments and experiments using osteoarthritis radiographs.
Across both settings, annotation shift produces characteristic directional asymmetries in cross-domain prediction errors that differ from the signatures of prevalence and acquisition shifts. These asymmetries provide a basis for distinguishing annotation shift from other forms of domain shift, enabling more reliable interpretation of cross-domain model failures.
Adversarial generative models applied to diagnosing Osteoarthritis
Evaluating different techniques for fine-tuning discriminator models to classify osteoarthritis
Self-supervised feature learning for diagnosing hip osteoarthritis in X-ray
How effectively can a VAE’s latent space reflect osteoarthritis severity and enable diagnostic accuracy under label scarcity and label noise?
Improving Generalizability in X-Ray Segmentation of the femur
Evaluating the Impact of Traditional Data Augmentation Techniques on the generalizability across Datasets
Challenges in Domain Adaptation for Medical Image Segmentation
A Study on Generalization of Hip X-Ray Segmentation for Osteoarthritis
X-Ray Image Segmentation of the Hip Joint
Segmentation of the hip joint space based on a radial projection originating from the center of the femoral head
For this joint space profile, the distance between the femoral head and the acetabular roof needs to be calculated. Therefore, the positions of these parts in the hip joint are required to be know. These can be retrieved from e.g. a segmentation mask.
One way of calculating the distance in a joint is to use a radial projection. A radial projection is a way of projecting points from a curved space to a plane by projecting lines from a central point along increasing angles.
In this paper, we investigate how the joint space profile can be segmented most accurately from a radial projection originating from the center of the femoral head by several comparing noise filtering and edge-finding algorithms. After which is shown that a custom algorithm based on the theory behind edge detection in noisy images works most reliably and accurately.
There are still multiple points of improvement for this algorithm. The femoral head can be segmented more accurately than the acetabular roof, the segmentation of the latter could be optimized by detecting the brightest line (peaks) instead of the most sudden change (steepest gradient) in the X-ray image as the edge for the femoral head. The algorithm could be further improved by taking care of local outliers off those edges.
In conclusion, this paper compares multiple ways of segmenting the joint space of the hip joint. The best-performing algorithm could in the future be used in an assisting tool for doctors to highlight important irregularities and measurements in the hip joint space. ...
For this joint space profile, the distance between the femoral head and the acetabular roof needs to be calculated. Therefore, the positions of these parts in the hip joint are required to be know. These can be retrieved from e.g. a segmentation mask.
One way of calculating the distance in a joint is to use a radial projection. A radial projection is a way of projecting points from a curved space to a plane by projecting lines from a central point along increasing angles.
In this paper, we investigate how the joint space profile can be segmented most accurately from a radial projection originating from the center of the femoral head by several comparing noise filtering and edge-finding algorithms. After which is shown that a custom algorithm based on the theory behind edge detection in noisy images works most reliably and accurately.
There are still multiple points of improvement for this algorithm. The femoral head can be segmented more accurately than the acetabular roof, the segmentation of the latter could be optimized by detecting the brightest line (peaks) instead of the most sudden change (steepest gradient) in the X-ray image as the edge for the femoral head. The algorithm could be further improved by taking care of local outliers off those edges.
In conclusion, this paper compares multiple ways of segmenting the joint space of the hip joint. The best-performing algorithm could in the future be used in an assisting tool for doctors to highlight important irregularities and measurements in the hip joint space.