Wouter Joosen | TU Delft Repository

A Run a Day Won't Keep the Hacker Away: Inference Attacks on Endpoint Privacy Zones in Fitness Tracking Social Networks

Conference paper (2022) - Karel Dhondt, Victor Le Pochat, VOULIMENEAS ALEXIOS, Wouter Joosen, Stijn Volckaert

Fitness tracking social networks such as Strava allow users to record sports activities and share them publicly. Sharing encourages peer interaction but also constitutes a risk, because an activity's start or finish may inadvertently reveal privacy-sensitive locations such as a home or workplace. To mitigate this risk, networks introduced endpoint privacy zones (EPZs), which hide track portions around protected locations. In this paper, we show that EPZ implementations of major services remain vulnerable to inference attacks that significantly reduce the effective anonymity provided by the EPZ, and even reveal the protected location. Our attack leverages distance information leaked in activity metadata, street grid data, and the locations of the entry points into the EPZ. This yields a constrained search space where we use regression analysis to predict protected locations. Our evaluation on 1.4 million Strava activities shows that our attack discovers the protected location for up to 85% of EPZs. Larger EPZs reduce the performance of our attack, while geographically dispersed activities in sparser street grids yield better performance. We propose six countermeasures, that, however, come with a usability trade-off, and responsibly disclosed our findings and countermeasures to the major networks. ...

Machine Learning Meets Data Modification

The Potential of Pre-processing for Privacy Enchancement

Book chapter (2022) - Giuseppe Garofalo, Manel Slokom, Davy Preuveneers, Wouter Joosen, Martha Larson

We explore how data modification can enhance privacy by examining the connection between data modification and machine learning. Specifically, machine learning “meets” data modification in two ways. First, data modification can protect the data that is used to train machine learning models focusing it on the intended use and inhibiting unwanted inference. Second, machine learning can provide new ways of creating modified data. In this chapter, we discuss data modification approaches, applied during data pre-processing, that are suited for online data sharing scenarios. Specifically, we define two scenarios “User data sharing” and “Data set sharing” and describe the threat models associated with each scenario and related privacy threats. We then survey the landscape of privacy-enhancing data modification techniques that can be used to counter these threats. The picture that emerges is that data modification approaches hold promise to enhance privacy, and can be used alongside of conventional cryptographic approaches. We close with an outlook on future directions focusing on new types of data, the relationship among privacy, and the importance of taking an interdisciplinary approach to data modification for privacy enhancement. ...

Helping hands: Measuring the impact of a large threat intelligence sharing community

Conference paper (2022) - X.B. Bouwman, Victor Le Pochat, Pawel Foremski, Tom Van Goethem, C. Hernandez Ganan, Giovane C.M. Moura, Samaneh Tajalizadehkhoob, Wouter Joosen, M.J.G. van Eeten

We tracked the largest volunteer security information sharing community known to date: the COVID-19 Cyber Threat Coalition, with over 4,000 members. This enabled us to address long-standing questions on threat information sharing. First, does collaboration at scale lead to better coverage? And second, does making threat data freely available improve the ability of defenders to act? We found that the CTC mostly aggregated existing industry sources of threat information. User-submitted domains often did not make it to the CTC's blocklist as a result of the high threshold posed by its automated quality assurance using VirusTotal. Although this ensured a low false positive rate, it also caused the focus of the blocklist to drift away from domains related to COVID-19 (1.4%-3.6%) to more generic abuse, such as phishing, for which established mitigation mechanisms already exist. However, in the slice of data that was related to COVID-19, we found promising evidence of the added value of a community like the CTC: just 25.1% of these domains were known to existing abuse detection infrastructures at time of listing, as compared to 58.4% of domains on the overall blocklist. From the unique experiment that the CTC represented, we draw three lessons for future threat data sharing initiatives. ...

Open-World Network Intrusion Detection

Book chapter (2022) - Vera Rimmer, Azqa Nadeem, Sicco Verwer, Davy Preuveneers, Wouter Joosen

This chapter contributes to the ongoing discussion of strengthening security by applying AI techniques in the scope of intrusion detection. The focus is set on open-world detection of attacks through data-driven network traffic analysis. This research topic is complementary to the earlier chapter on intelligent malware detection. In this chapter, we revisit the foundations of machine learning-based solutions for network security, which aim to make network defense tools more autonomous, adaptive, proactive and responsive. Specifically, we give a comprehensive introduction to the research on anomaly detection for network intrusion detection – that is, defensive schemes that do not assume complete prior knowledge of malicious patterns and instead learn the notion of normality from benign traffic. Along with outlining the recent advances in the field, we provide insights and reflect on the current limitations and research challenges. Therefore, this chapter presents compelling research opportunities to advance machine learning techniques in network security and push the boundaries of open-world network intrusion detection. ...

TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation

Conference paper (2019) - Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Wouter Joosen

In order to evaluate the prevalence of security and privacy practices on a representative sample of the Web, researchers rely on website popularity rankings such as the Alexa list. While the validity and representativeness of these rankings are rarely questioned, our findings show the contrary: we show for four main rankings how their inherent properties (similarity, stability, representativeness, responsiveness and benignness) affect their composition and therefore potentially skew the conclusions made in studies. Moreover, we find that it is trivial for an adversary to manipulate the composition of these lists. We are the first to empirically validate that the ranks of domains in each of the lists are easily altered, in the case of Alexa through as little as a single HTTP request. This allows adversaries to manipulate rankings on a large scale and insert malicious domains into whitelists or bend the outcome of research studies to their will. To overcome the limitations of such rankings, we propose improvements to reduce the fluctuations in list composition and guarantee better defenses against manipulation. To allow the research community to work with reliable and reproducible rankings, we provide TRANCO, an improved ranking that we offer through an online service available at https://tranco-list.eu. ...

Herding Vulnerable Cats

A Statistical Approach to Disentangle Joint Responsibility for Web Security in Shared Hosting

Conference paper (2017) - Samaneh Tajalizadehkhoob, Tom Van Goethem, Maciej Korczynski, Arman Noroozian, Rainer Böhme, Tyler Moore, Wouter Joosen, Michel van Eeten

Hosting providers play a key role in fighting web compromise, but their ability to prevent abuse is constrained by the security practices of their own customers. Shared hosting, offers a unique perspective since customers operate under restricted privileges and providers retain more control over configurations. We present the first empirical analysis of the distribution of web security features and software patching practices in shared hosting providers, the influence of providers on these security practices, and their impact on web compromise rates. We construct provider-level features on the global market for shared hosting -- containing 1,259 providers -- by gathering indicators from 442,684 domains. Exploratory factor analysis of 15 indicators identifies four main latent factors that capture security efforts: content security, webmaster security, web infrastructure security and web application security. We confirm, via a fixed-effect regression model, that providers exert significant influence over the latter two factors, which are both related to the software stack in their hosting environment. Finally, by means of GLM regression analysis of these factors on phishing and malware abuse, we show that the four security and software patching factors explain between 10% and 19% of the variance in abuse at providers, after controlling for size. For web-application security for instance, we found that when a provider moves from the bottom 10% to the best-performing 10%, it would experience 4 times fewer phishing incidents. We show that providers have influence over patch levels--even higher in the stack, where CMSes can run as client-side software--and that this influence is tied to a substantial reduction in abuse levels. ...