Out of Sight, Out of Mind: A Comprehensive Study on the Prevalence and Security Impact of Orphaned Web Pages

Master Thesis (2021)
Author(s)

S.R.G. Pletinckx (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Tobias Fiebig – Mentor (TU Delft - Information and Communication Technology)

Z Erkin – Graduation committee member (TU Delft - Cyber Security)

S. Picek – Coach (TU Delft - Cyber Security)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Stijn Pletinckx
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Stijn Pletinckx
Graduation Date
22-07-2021
Awarding Institution
Delft University of Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Security misconfigurations and neglected updates commonly lead to systems being vulnerable. Ranging from default passwords to unpatched software, many systems, such as websites or databases, are being compromised due to these pitfalls. Often stemming from human error, it is difficult to avoid these misconfigurations, which is why they are repeatedly seen in the wild. Especially in the context of websites, we often find pages that were forgotten, that is, they were left online after they served their purpose and were never updated thereafter.
In this thesis, we introduce the first methodology to detect such forgotten or orphaned web pages, a type of misconfiguration that has seen little attention. We combine the historic data set of the Internet Archive with active measurements to identify pages that can no longer be reached via a path from the index page, yet remain accessible through their specific URL. We show the efficacy of our approach and the real-world relevance of the issue of orphaned web pages by applying it to a sample of 100,000 domains from the Tranco Top 1M. This particular type of misconfiguration can pose a serious threat, as they can lead to forgotten and unmaintained web pages, which may not have seen security updates.
In our measurements, we find 1,953 pages on 907 unique domains that are orphaned, some of which are 20 years old. In our subsequent security analysis, we find that these pages are significantly (p < 0.01 using χ 2 ) more likely to contain some vulnerabilities than maintained pages: 7.1% of orphaned pages suffer from simple XSS vulnerabilities, while only 0.9% of maintained pages are vulnerable against this type of attack. We encounter similar patterns for following best security practices, such as for the use of Content Security Policies (CSPs) and setting HTTP Security Headers. To allow researchers to reproduce our results and practitioners to scrutinize their own pages, we provide an implementation of our methodology as open source software.

Files

Stijn_Pletinckx_Master_Thesis.... (pdf)
(pdf | 4.99 Mb)
- Embargo expired in 15-01-2022
License info not available