The Good, the Bad, and the Scanned: An Empirical Study of the Origins of Internet-wide Scanners
More Info
expand_more
expand_more
Abstract
Security researchers and industry firms employ Internet-wide scanning for information collection, vulnerability detection and security evaluation, while cybercriminals make use of it to find and attack unsecured devices. Internet scanning plays a considerable role in threat detection & response, and cyber threat intelligence. We adopt a data-driven approach, analyzing a large dataset of network traffic collected through a network telescope, to identify the origins of Internet scanners and their affiliations. We provide a traffic analysis of two monthly snapshots in two different years (2023 & 2024) of approximately 10 billion packets each. We also provide a methodology for data collection and aggregation of known/institutional scanners.
The study reveals that a small number of source IP addresses account for almost the entire portion of traffic volume, with 1% of total addresses contributing 97.38% of total traffic in June 2023 and 96.65% in February 2024. Traffic analysis identifies 40 to 44 known scanners, accounting for 0.36 to 0.62% of source IPs and 50.86 to 51.31% of total telescope traffic in each month. However, seven to ten organizations are responsible for around half of the total telescope traffic each month. The study also identifies 34 commercial bots, with a negligible footprint, accounting for up to 0.25% of total source IPs and less than 0.01% of total traffic per month. Mirai probes contribute 1 to 1.5% of monthly scanning traffic, with a burst in IP addresses in 2023. Similarly, traffic from Tor exit nodes appears small, constituting 0.01% of overall Darknet traffic and 0.04-0.06% of source IPs per month. The study also reports on the current usage of scanning software such as ZMap and Masscan, finding that around 40% of each monthly traffic volume contains the ZMap signature. Lastly, we highlight the further need for mutual exchange of threat intelligence among defenders, as well as the extension of data collection period and the establishment of a pipeline for continuous discovery and integration of known scanners from a research perspective, in order to efficiently differentiate institutional scanners and malicious actors, within an evolving cyber landscape.