Pull-based Scraping vs. eBPF Auto-instrumentation: Overhead, Coverage, and Trade-offs
V. Ilchev (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S.M.B.S. Samarakoon Mudiyanselage – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Nitinder Mohan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Jérémie Decouchant – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Cloud-native 5G core networks transform network functions into containerised microservices, which simplifies their management but fragments their observability across multiple telemetry layers. Monitoring these systems requires balancing between visibility and the overhead created by the control plane being observed. This paper evaluates two fundamentally different collection paradigms: pull-based scraping via Prometheus and eBPF auto-instrumentation via Grafana Beyla, on a live Open5GS 5G core deployed on a three-node Kubernetes cluster. Each stack is deployed in isolation and in combination: resource overhead is quantified across multiple granularity settings, scalability is measured by changing the number of Network Functions (NFs) being monitored at a time, and fault-detection coverage is assessed over 22 injected scenarios across five fault classes, using Chaos Mesh for controlled injection. Prometheus incurs substantially higher monitoring stack overhead; Beyla’s sampling rate has negligible effect on the cost, because kernel uprobes fire on every HTTP/2 library call regardless of the sampling decision. For fault observability, across all 3 runs, Beyla flags all 22 fault types in at least one of those runs, while Prometheus misses only one (NRF cascade failure). However, throughout all three runs together, only 10 / 22 faults are detected reliably by both methods. Per-run reliability favours Prometheus (87.9% vs. 81.8%). We conclude that Beyla offers broader fault-type coverage at lower overhead, but Prometheus provides more consistent detection per individual injection.