M.T. Korczynski | TU Delft Repository

Rotten apples or bad harvest? What we are measuring when we are measuring abuse

Journal article (2018) - Samaneh Tajalizadehkhoob, Rainer Böhme, Carlos Gañán, Maciej Korczyński, Michel Van Eeten

Internet security and technology policy research regularly uses technical indicators of abuse to identify culprits and to tailor mitigation strategies. As a major obstacle, current inferences from abuse data that aim to characterize providers with poor security practices often use a naive normalization of abuse (abuse counts divided by network size) and do not take into account other inherent or structural properties of providers. Even the size estimates are subject to measurement errors relating to attribution, aggregation, and various sources of heterogeneity. More precise indicators are costly to measure at Internet scale. We address these issues for the case of hosting providers with a statistical model of the abuse data generation process, using phishing sites in hosting networks as a case study. We decompose error sources and then estimate key parameters of the model, controlling for heterogeneity in size and business model. We find that 84% of the variation in abuse counts across 45,358 hosting providers can be explained with structural factors alone. Informed by the fitted model, we systematically select and enrich a subset of 105 homogeneous “statistical twins” with additional explanatory variables, unreasonable to collect for all hosting providers. We find that abuse is positively associated with the popularity of websites hosted and with the prevalence of popular content management systems. Moreover, hosting providers who charge higher prices (after controlling for level differences between countries) witness less abuse. These structural factors together explain a further 77% of the remaining variation. This calls into question premature inferences from raw abuse indicators about the security efforts of actors, and suggests the adoption of similar analysis frameworks in all domains where network measurement aims at informing technology policy. ...

Inferring the Security Performance of Providers from Noisy and Heterogenous Abuse Datasets

Conference paper (2017) - Arman Noroozian, Michael Ciere, Maciej Korczynski, Samaneh Tajalizadehkhoob, Michel van Eeten

Make notifications great again: learning how to notify in the age of large-scale vulnerability scanning

Conference paper (2017) - F.O. Cetin, Carlos Ganán, Maciej Korczynski, Michel van Eeten

As large-scale vulnerability detection becomes more feasible, it also increases the urgency to find effective largescale notification mechanisms to inform the affected parties. Researchers, CERTs, security companies and other organizations with vulnerability data have a variety of options to identify, contact and communicate with the actors responsible for the affected system or service. A lot of things can – and do – go wrong. It might be impossible to identify the appropriate recipient of the notification, the message might not be trusted by the recipient, it might be overlooked or ignored or misunderstood. Such problems multiply as the volume of notifications increases. In this paper, we undertake several large-scale notification campaigns for a vulnerable configuration of authoritative nameservers. We investigate three issues: What is the most effective way to reach the affected parties? What communication path mobilizes the strongest incentive for remediation? And finally, what is the impact of providing recipients a mechanism to actively demonstrate the vulnerability for their own system, rather than sending them the standard static notification message. We find that retrieving contact information at scale is highly problematic, though there are different degrees of failure for different mechanisms. For those parties who are reached, notification significantly increases remediation rates. Reaching out to nameserver operators directly had better results than going via their customers, the domain owners. While the latter, in principle, have a stronger incentive to care and their request for remediation would trigger the commercial incentive of the operator to keep its customers happy, this communication path turned out to have slightly worse remediation rates. Finally, we find no evidence that vulnerability demonstrations did better than static messages. In fact, few recipients engaged with the demonstration website. ...

As large-scale vulnerability detection becomes more feasible, it also increases the urgency to find effective largescale notification mechanisms to inform the affected parties. Researchers, CERTs, security companies and other organizations with vulnerability data have a variety of options to identify, contact and communicate with the actors responsible for the affected system or service. A lot of things can – and do – go wrong. It might be impossible to identify the appropriate recipient of the notification, the message might not be trusted by the recipient, it might be overlooked or ignored or misunderstood. Such problems multiply as the volume of notifications increases. In this paper, we undertake several large-scale notification campaigns for a vulnerable configuration of authoritative nameservers. We investigate three issues: What is the most effective way to reach the affected parties? What communication path mobilizes the strongest incentive for remediation? And finally, what is the impact of providing recipients a mechanism to actively demonstrate the vulnerability for their own system, rather than sending them the standard static notification message. We find that retrieving contact information at scale is highly problematic, though there are different degrees of failure for different mechanisms. For those parties who are reached, notification significantly increases remediation rates. Reaching out to nameserver operators directly had better results than going via their customers, the domain owners. While the latter, in principle, have a stronger incentive to care and their request for remediation would trigger the commercial incentive of the operator to keep its customers happy, this communication path turned out to have slightly worse remediation rates. Finally, we find no evidence that vulnerability demonstrations did better than static messages. In fact, few recipients engaged with the demonstration website.

Using loops observed in traceroute to infer the ability to spoof

Conference paper (2017) - Qasim Lone, Matthew Luckie, Maciej Korczyński, Michel Van Eeten

Despite source IP address spoofing being a known vulnerability for at least 25 years, and despite many efforts to shed light on the problem, spoofing remains a popular attack method for redirection, amplification, and anonymity. To defeat these attacks requires operators to ensure their networks filter packets with spoofed source IP addresses, known as source address validation (SAV), best deployed at the edge of the network where traffic originates. In this paper, we present a new method using routing loops appearing in traceroute data to infer inadequate SAV at the transit provider edge, where a provider does not filter traffic that should not have come from the customer. Our method does not require a vantage point within the customer network. We present and validate an algorithm that identifies at Internet scale which loops imply a lack of ingress filtering by providers. We found 703 provider ASes that do not implement ingress filtering on at least one of their links for 1,780 customer ASes. Most of these observations are unique compared to the existing methods of the Spoofer and Open Resolver projects. By increasing the visibility of the networks that allow spoofing, we aim to strengthen the incentives for the adoption of SAV. ...

Anomaly detection through information sharing under different topologies

Journal article (2017) - Lazaros K. Gallos, M.T. Korczynski, Nina H. Fefferman

Early detection of traffic anomalies in networks increases the probability of effective intervention/mitigation actions, thereby improving the stability of system function. Centralized methods of anomaly detection are subject to inherent constraints: (1) they create a communication burden on the system, (2) they impose a delay in detection while information is being gathered, and (3) they require some trust and/or sharing of traffic information patterns. On the other hand, truly parallel, distributed methods are fast and private but can observe only local information. These methods can easily fail to see the “big picture” as they focus on only one thread in a tapestry. A recently proposed algorithm, Distributed Intrusion/Anomaly Monitoring for Nonparametric Detection (DIAMoND), addressed these problems by using parallel surveillance that included dynamic detection thresholds. These thresholds were functions of nonparametric information shared among network neighbors. Here, we explore the influence of network topology and patterns in normal traffic flow on the performance of the DIAMoND algorithm. We contrast performance to a truly parallel, independent surveillance system. We show that incorporation of nonparametric data improves anomaly detection capabilities in most cases, without incurring the practical problems of fully parallel network surveillance. ...

Herding Vulnerable Cats

A Statistical Approach to Disentangle Joint Responsibility for Web Security in Shared Hosting

Conference paper (2017) - Samaneh Tajalizadehkhoob, Tom Van Goethem, Maciej Korczynski, Arman Noroozian, Rainer Böhme, Tyler Moore, Wouter Joosen, Michel van Eeten

Hosting providers play a key role in fighting web compromise, but their ability to prevent abuse is constrained by the security practices of their own customers. Shared hosting, offers a unique perspective since customers operate under restricted privileges and providers retain more control over configurations. We present the first empirical analysis of the distribution of web security features and software patching practices in shared hosting providers, the influence of providers on these security practices, and their impact on web compromise rates. We construct provider-level features on the global market for shared hosting -- containing 1,259 providers -- by gathering indicators from 442,684 domains. Exploratory factor analysis of 15 indicators identifies four main latent factors that capture security efforts: content security, webmaster security, web infrastructure security and web application security. We confirm, via a fixed-effect regression model, that providers exert significant influence over the latter two factors, which are both related to the software stack in their hosting environment. Finally, by means of GLM regression analysis of these factors on phishing and malware abuse, we show that the four security and software patching factors explain between 10% and 19% of the variance in abuse at providers, after controlling for size. For web-application security for instance, we found that when a provider moves from the bottom 10% to the best-performing 10%, it would experience 4 times fewer phishing incidents. We show that providers have influence over patch levels--even higher in the stack, where CMSes can run as client-side software--and that this influence is tied to a substantial reduction in abuse levels. ...

No domain left behind

Is Let's Encrypt democratizing encryption?

Conference paper (2017) - Maarten Aertsen, Maciej Korczyński, Giovane C.M. Moura, Samaneh Tajalizadehkhoob, Jan Van Den Berg

The 2013 National Security Agency revelations of pervasive monitoring have led to an "encryption rush" across the computer and Internet industry. To push back against massive surveillance and protect users' privacy, vendors, hosting and cloud providers have widely deployed encryption on their hardware, communication links, and applications. As a consequence, most web connections nowadays are encrypted. However, there is still a significant part of Internet traffic that is not encrypted. It has been argued that both costs and complexity associated with obtaining and deploying X.509 certificates are major barriers for widespread encryption, since these certificates are required to establish encrypted connections. To address these issues, the Electronic Frontier Foundation, Mozilla Foundation, the University of Michigan and a number of partners have set up Let's Encrypt (LE), a certificate authority that provides both free X.509 certificates and software that automates the deployment of these certificates. In this paper, we investigate if LE has been successful in democratizing encryption: we analyze certificate issuance in the first year of LE and show from various perspectives that LE adoption has an upward trend and it is in fact being successful in covering the lower-cost end of the hosting market. ...

Reputation Metrics Design to Improve Intermediary Incentives for Security of TLDs

Conference paper (2017) - Maciej Korczynski, Samaneh Tajalizadehkhoob, Arman Noroozian, Maarten Wullink, Cristian Hesselman, Michel Van Eeten

Over the years cybercriminals have misused the Domain Name System (DNS) - a critical component of the Internet - to gain profit. Despite this persisting trend, little empirical information about the security of Top-Level Domains (TLDs) and of the overall 'health' of the DNS ecosystem exists. In this paper, we present security metrics for this ecosystem and measure the operational values of such metrics using three representative phishing and malware datasets. We benchmark entire TLDs against the rest of the market. We explicitly distinguish these metrics from the idea of measuring security performance, because the measured values are driven by multiple factors, not just by the performance of the particular market player. We consider two types of security metrics: occurrence of abuse and persistence of abuse. In conjunction, they provide a good understanding of the overall health of a TLD. We demonstrate that attackers abuse a variety of free services with good reputation, affecting not only the reputation of those services, but of entire TLDs. We find that, when normalized by size, old TLDs like.com host more bad content than new generic TLDs. We propose a statistical regression model to analyze how the different properties of TLD intermediaries relate to abuse counts. We find that next to TLD size, abuse is positively associated with domain pricing (i.e. registries who provide free domain registrations witness more abuse). Last but not least, we observe a negative relation between the DNSSEC deployment rate and the count of phishing domains. ...

Evaluating the Impact of AbuseHUB on Botnet Mitigation

Report (2016) - Michel van Eeten, Qasim Lone, Giovane Moreira Moura, Hadi Asghari, Maciej Korczynski

This documents presents the final report of a two-year project to evaluate the impact of AbuseHUB, a Dutch clearinghouse for acquiring and processing abuse data on infected machines. The report was commissioned by the Netherlands Ministry of Economic Affairs, a co-funder of the development of AbuseHUB. AbuseHUB is the initiative of 9 Internet Service Providers, SIDN (the registry for the .nl top-level domain) and Surfnet (the national research and education network operator). The key objective of AbuseHUB is to improve the mitigation of botnets by its members. We set out to assess whether this objective is being reached by analyzing malware infection levels in the networks of AbuseHUB members and comparing them to those of other Internet Service Providers (ISPs). Since AbuseHUB members together comprise over 90 percent of the broadband market in the Netherlands, it also makes sense to compare how the country as a whole has performed compared to other countries. This report complements the baseline measurement report produced in December 2013 and the interim report from March 2015. We are using the same data sources as in the interim report, which is an expanded set compared to the earlier baseline report and to our 2011 study into botnet mitigation in the Netherlands. ...

Apples, oranges and hosting providers

Heterogeneity and security in the hosting market

Conference paper (2016) - Samaneh Tajalizadehkhoob, Maciej Korczyński, Arman Noroozian, Carlos Gañán, Michel Van Eeten

Hosting services are associated with various security threats, yet the market has barely been studied empirically. Most security research has relied on routing data and equates providers with Autonomous Systems, ignoring the complexity and heterogeneity of the market. To overcome these limitations, we combined passive DNS data with WHOIS data to identify providers and some of their properties. We found 45,434 hosting providers, spread around a median address space size of 1,517 IP addresses. There is surprisingly little consolidation in the market, even though its services seem amenable to economies of scale. We applied cluster analysis on several measurable characteristics of providers. This uncovered a diverse set of business profiles and an indication of what fraction of the market fits each profile. The profiles are associated with significant differences in security performance, as measured by the uptime of phishing sites. This suggests the approach provides an effective way for security researchers to take the heterogeneity of the market into account. ...

Who Gets the Boot? Analyzing Victimization by DDoS-as-a-Service

Conference paper (2016) - Arman Noroozian, Maciej Korczynski, Carlos Hernandez Ganan, Daisuke Makita, Katsunari Yoshioka, Michel van Eeten

A lot of research has been devoted to understanding the technical properties of amplification DDoS attacks and the emergence of the DDoS-as-a-service economy, especially the so-called booters. Much less is known about the consequences for victimization patterns. We profile victims via data from amplification DDoS honeypots. We develop victimization rates and present explanatory models capturing key determinants of these rates. Our analysis demonstrates that the bulk of the attacks are directed at users in access networks, not at hosting, and even less at enterprise networks. We find that victimization in broadband ISPs is highly proportional to the number of ISP subscribers and that certain countries have significantly higher or lower victim rates which are only partially explained by institutional factors such as ICT development. We also find that victimization rate in hosting networks is proportional to the number of hosted domains and number of routed IP addresses and that content popularity has a minor impact on victimization rates. Finally, we reflect on the implications of these findings for the wider trend of commoditization in cybercrime. ...

Zone poisoning

The how and where of non-secure DNS dynamic updates

Conference paper (2016) - Maciej Korczynski, Michał Król, Michel Van Eeten

This paper illuminates the problem of non-secure DNS dynamic updates, which allow a miscreant to manipulate DNS entries in the zone files of authoritative name servers. We refer to this type of attack as to zone poisoning. This paper presents the first measurement study of the vulnerability. We analyze a random sample of 2.9 million domains and the Alexa top 1 million domains and find that at least 1,877 (0.065%) and 587 (0.062%) of domains are vulnerable, respectively. Among the vulnerable domains are governments, health care providers and banks, demonstrating that the threat impacts important services. Via this study and subsequent notifications to affected parties, we aim to improve the security of the DNS ecosystem. ...