Despite the Tor network's strong anonymity guarantees, some onion services unintentionally leak operational metadata through misconfigurations. This thesis explores one such leakage: Apache's mod_status diagnostic endpoint, which exposes real-time server information not intended
...
Despite the Tor network's strong anonymity guarantees, some onion services unintentionally leak operational metadata through misconfigurations. This thesis explores one such leakage: Apache's mod_status diagnostic endpoint, which exposes real-time server information not intended for public access.
The root cause of this exposure lies in how Tor relays traffic. Requests sent to an onion address appear to originate from localhost, causing Apache to grant access to debugging pages intended to be private inadvertently. While this behavior is recognized in operational security communities as a known misconfiguration, its potential for attribution has received limited attention, remaining largely unexplored in academic literature.
To study this problem systematically, we analyzed over 210,000 server-status snapshots from more than 12,500 onion services, collected between 2019 and 2025 through continuous crawling of the Tor network. We began by developing a custom parser to extract the full contents of these pages, providing structured access to metadata such as public IP addresses, virtual hostnames (VHost), and client request logs. We then applied two attribution techniques: direct attribution via exposed infrastructure details and indirect attribution based on recurring VHost values and configuration similarities, such as shared version strings or runtime behavior. The latter was modeled as a graph to investigate clusters of likely co-hosted services.
The results show that server-status exposure is both widespread and persistent. In many cases, the leaked data enabled reliable deanonymization or revealed backend infrastructure shared across multiple services. In total, we identified 378 clusters linking 7,681 domains, over 61% of all onion services with a server-status page. Of these, 31 clusters included public IP addresses, covering 1,404 services, offering rare but unambiguous evidence for attribution.
This thesis serves as a feasibility study, demonstrating that attribution can be achieved systematically without exploiting Tor or undermining its anonymity mechanisms. We demonstrate that this unintentional exposure offers an opportunistic, passive, and scalable source of investigative insight. It highlights the untapped potential of server-status pages as a practical, low-cost technique for uncovering otherwise hidden dark web infrastructure. This method can be easily integrated into existing investigative workflows.