BCLIP-ADer
A Bayesian Prompt Contrastive Language-Image Pretraining Method for Catenary Component Anomaly Detection in Electrified Railways
Haonan Yang (Southwest Jiaotong University)
Keting Hu (Southwest Jiaotong University)
Hui Wang (Southwest Jiaotong University)
Weijia Hong (Southwest Jiaotong University)
Xufan Wang (Southwest Jiaotong University)
Hongrui Wang (TU Delft - Civil Engineering & Geosciences, Southwest Jiaotong University)
Yang Song (Southwest Jiaotong University)
Zhigang Liu (Southwest Jiaotong University)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
As an essential subsystem of electrified railway operation and maintenance, intelligent detection of catenary support components still faces several critical challenges: (1) the number of abnormal (negative) samples for components is severely limited; (2) component anomalies are highly diverse and exhibit heterogeneous visual characteristics; and (3) existing models generally show unsatisfactory detection performance when confronted with previously unseen anomaly types. To address these issues, this paper proposes a novel few-shot anomaly detection model for catenary components, termed BCLIP-ADer, built upon a Bayesian prompt contrastive vision-language pretraining framework. Specifically, a Bayesian prompt flow module (PFM) is designed to regularize the text prompt space via the jointly learned image-specific feature distribution (ISFD) and image-agnostic feature distribution (IAFD), thereby mitigating the degradation in detection performance on unseen component anomalies. Monte Carlo sampling over these learned distributions is further employed to generate diverse text prompts, leading to more comprehensive coverage of the prompt space. In addition, a cross-modal feature refinement module (CFRM) is designed to more effectively align dynamic text embeddings with fine-grained image features, thus enhancing anomaly detection at the component level. Finally, extensive experiments conducted on a UAV-based catenary dataset (CSCUD) demonstrate the effectiveness and superiority of the proposed approach. Specifically, the proposed method achieves I-AUROC/I-AP/I-F1_max scores of 94.2/93.2/93.1 under few-shot conditions.
Files
File under embargo until 04-11-2026