S. Zannettou | TU Delft Repository

Does Ad-Free Mean Less Data Collection? An Empirical Study of Platform Data Practices and User Expectations

Conference paper (2026) - Sepehr Mousavi, Abhisek Dash, Savvas Zannettou, Krishna P. Gummadi

Online platforms increasingly offer "paid"ad-free subscriptions as an alternative to the traditional "free"ad-based model. The transition to ad-free models ostensibly removes advertising as a key justification for data processing under the GDPR. So, normatively, platforms should collect less user data. However, platforms may justify continued data collection as a means to provide an improved, personalized experience. This tension between privacy principles and platform incentives raises a critical underexplored question: do data collection practices vary between ad-free and ad-based subscription models? In this paper, we shed light on this important privacy issue by investigating the alignment between platform data collection practices and related user expectations. With respect to data collection process, our analyses of data exports from three major online platforms - Instagram, Facebook, and X - reveal that these platforms continue to retain or collect some ad-related data, even in ad-free subscriptions. With respect to user expectations, our survey among 255 participants on Prolific reveals that 69% of the participants normatively expect data collection to be reduced, indicating their expectation of improved digital privacy in an ad-free model. However, when asked what they think actually happens, 63% of these participants believed that platforms would still collect about the same amount of data, highlighting skepticism about platform practices. Our findings not only indicate a significant disconnect between data practices and normative user expectations, but also raise serious questions about platform compliance with core GDPR principles, such as purpose limitation, data minimization, and transparency. ...

Gold Standard or Gold-Plated? Human Practices of Triple Verification in CSAM Takedown

Conference paper (2026) - Melissa Rottier, Michel van Eeten, Savvas Zannettou

Child sexual abuse material (CSAM) presents a critical challenge for online safety, yet the verification procedures that determine which items are classified as CSAM remain poorly understood. Triple verification (requiring three reviewers to agree) is promoted as a safeguard, but little is known about how it is implemented, how it is perceived by experts, and how voting conditions affect reliability. We address this gap through a mixed-methods study. We interviewed 14 experts from seven organizations (e.g., law enforcement, hotlines, etc.) to map current verification practices, then ran an inter-reliability experiment with Dutch National Police experts who reviewed 2,031 images and videos under different voting conditions (blind vs. non-blind, varied order). Finally, we held a focus group to explore the reasons behind disagreements. We find that practices vary widely, perceptions of triple verification reflect both safeguards and burdens, and expert agreement depends on voting conditions and content type. ...

The Algorithmic Self-Portrait

Deconstructing Memory in ChatGPT

Conference paper (2026) - Abhisek Dash, Soumi Das, Elisabeth Kirsten, Qinyuan Wu, Sai Keerthana Karnam, Krishna P. Gummadi, Thorsten Holz, Muhammad Bilal Zafar, Savvas Zannettou

To enable personalized and context-aware interactions, conversational AI systems have introduced a new mechanism: Memory. Memory creates what we refer to as the Algorithmic Self-portrait - -a new form of personalization derived from users' self-disclosed information divulged within private conversations. While memory enables more coherent exchanges, the underlying processes of memory creation remain opaque, raising critical questions about data sensitivity, user agency, and the fidelity of the resulting portrait. To bridge this research gap, we analyze 2,050 memory entries from 80 real-world ChatGPT users. Our analyses reveal three key findings: (1) a striking 96% of memories in our dataset are created unilaterally by the conversational system, potentially shifting agency away from the user; (2) Memories, in our dataset, contain a rich mix of GDPR-defined personal data (in 28% memories) along with psychological insights about participants (in 52% memories); and (3) A significant majority of the memories (84%) are directly grounded in user context, indicating faithful representation of the conversations. Finally, we introduce a framework - - Attribution Shield - -that anticipates these inferences, alerts about potentially sensitive memory inferences, and suggests query reformulations to protect personal information without sacrificing utility. ...

Bowling with ChatGPT

On the Evolving User Interactions with Conversational AI Systems

Conference paper (2026) - Sai Keerthana Karnam, Abhisek Dash, Krishna Gummadi, Animesh Mukherjee, Ingmar Weber, Savvas Zannettou

Recent studies have discussed how users are increasingly using conversational AI systems, powered by LLMs, for information seeking, decision support, and even emotional support. However, these macro-level observations offer limited insight into how the purpose of these interactions shifts over time, how users frame their interactions with the system, and how steering dynamics unfold in these human-AI interactions. To examine these evolving dynamics, we gathered and analyzed a unique dataset InVivoGPT: consisting of 825K ChatGPT interactions, donated by 300 users through their GDPR data rights. Our analyses reveal three key findings. First, participants increasingly turn to ChatGPT for a broader range of purposes, including substantial growth in sensitive domains such as health and mental health. Second, interactions become more socially framed: the system anthropomorphizes itself at rising rates, participants more frequently treat it as a companion, and personal data disclosure becomes both more common and more diverse. Third, conversational steering becomes more prominent, especially after the release of GPT-4o, with conversations where the participants followed a model-initiated suggestion quadrupling over the period of our dataset. Overall, our results show that conversational AI systems are shifting from functional tools to social partners, raising important questions about their design and governance. ...

UnsafeBench

Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

Conference paper (2025) - Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang

With the advent of text-to-image models and concerns about their misuse, developers are increasingly relying on image safety classifiers to moderate their generated unsafe images. Yet, the performance of current image safety classifiers remains unknown for both real-world and AI-generated images. In this work, we propose UnsafeBench, a benchmarking framework that evaluates the effectiveness and robustness of image safety classifiers, with a particular focus on the impact of AI-generated images on their performance. First, we curate a large dataset of 10K real-world and AI-generated images that are annotated as safe or unsafe based on a set of 11 unsafe categories of images (sexual, violent, hateful, etc.). Then, we evaluate the effectiveness and robustness of five popular image safety classifiers, as well as three classifiers that are powered by general-purpose visual language models. Our assessment indicates that existing image safety classifiers are not comprehensive and effective enough to mitigate the multifaceted problem of unsafe images. Also, there exists a distribution shift between real-world and AI-generated images in image qualities, styles, and layouts, leading to degraded effectiveness and robustness. Motivated by these findings, we build a comprehensive image moderation tool called PerspectiveVision, which improves the effectiveness and robustness of existing classifiers, especially on AI-generated images. UnsafeBench and PerspectiveVision can aid the research community in better understanding the landscape of image safety classification in the era of generative AI. ...

What News Do People Get on Social Media?

Analyzing Exposure and Consumption of News Through Data Donations

Conference paper (2024) - Salim Chouaki, Abhijnan Chakraborty, Oana Goga, Savvas Zannettou

Understanding how exposure to news on social media impacts public discourse and exacerbates political polarization is a significant endeavor in both computer and social sciences. Unfortunately, progress in this area is hampered by limited access to data due to the closed nature of social media platforms. Consequently, prior studies have been constrained to considering only fragments of users' news exposure and reactions. To overcome this obstacle, we present an innovative measurement approach centered on donating personal data for scientific purposes, facilitated through a privacy-preserving tool that captures users' interactions with news on Facebook. This approach offers a nuanced perspective on users' news exposure and consumption, encompassing different types of news exposure: selective, incidental, algorithmic, and targeted, driven by the diverse underlying mechanisms governing news appearance on users' feeds. Our analysis of data from 472 participants based in the U.S. reveals several interesting findings. For instance, users are more prone to encountering misinformation because of their active selection of low-quality news sources rather than being exposed solely due to friends or platform algorithms. Furthermore, our study uncovers that users are open to engaging with news sources with opposite political ideology as long as these interactions are not visible to their immediate social circles. Overall, our study showcases the viability of data donation as a means to provide clarity to longstanding questions in this field, offering new perspectives on the intricate dynamics of social media news consumption and its effects. ...

Analyzing User Engagement with TikTok's Short Format Video Recommendations using Data Donations

Conference paper (2024) - Savvas Zannettou, Olivia Nemes-Nemeth, Oshrat Ayalon, Angelica Goetzen, Krishna P. Gummadi, Elissa M. Redmiles, Franziska Roesner

Short-format videos have exploded on platforms like TikTok, Instagram, and YouTube. Despite this, the research community lacks large-scale empirical studies into how people engage with short-format videos and the role of recommendation systems that offer endless streams of such content. In this work, we analyze user engagement on TikTok using data we collect via a data donation system that allows TikTok users to donate their data. We recruited 347 TikTok users and collected 9.2M TikTok video recommendations they received. By analyzing user engagement, we find that the average daily usage time increases over the users’ lifetime while the user attention remains stable at around 45%. We also find that users like more videos uploaded by people they follow than those recommended by people they do not follow. Our study offers valuable insights into how users engage with short-format videos on TikTok and lessons learned from designing a data donation system. ...

TikTok and the Art of Personalization

Investigating Exploration and Exploitation on Social Media Feeds

Conference paper (2024) - Karan Vombatkere, Sepehr Mousavi, Savvas Zannettou, Franziska Roesner, Krishna P. Gummadi

Recommendation algorithms for social media feeds often function as black boxes from the perspective of users. We aim to detect whether social media feed recommendations are personalized to users, and to characterize the factors contributing to personalization in these feeds. We introduce a general framework to examine a set of social media feed recommendations for a user as a timeline. We label items in the timeline as the result of exploration vs. exploitation of the user's interests on the part of the recommendation algorithm and introduce a set of metrics to capture the extent of personalization across user timelines. We apply our framework to a real TikTok dataset and validate our results using a baseline generated from automated TikTok bots, as well as a randomized baseline. We also investigate the extent to which factors such as video viewing duration, liking, and following drive the personalization of content on TikTok. Our results demonstrate that our framework produces intuitive and explainable results, and can be used to audit and understand personalization in social media feeds. ...

On the Globalization of the QAnon Conspiracy Theory Through Telegram

Conference paper (2023) - Mohamad Hoseini, Philipe Melo, Fabricio Benevenuto, Anja Feldmann, Savvas Zannettou

QAnon is a far-right conspiracy theory that has implications in the real world, with supporters of the theory participating in real-world violent acts like the US capitol attack in 2021. At the same time, the QAnon theory started evolving into a global phenomenon by attracting followers across the globe and, in particular, in Europe, hence it is imperative to understand how QAnon has become a worldwide phenomenon and how this dissemination has been happening in the online space. This paper performs a large-scale data analysis of QAnon through Telegram by collecting 4.4M messages posted in 161 QAnon groups/channels. Using Google's Perspective API, we analyze the toxicity of QAnon content across languages and over time. Also, using a BERT-based topic modeling approach, we analyze the QAnon discourse across multiple languages. Among other things, we find that the German language is prevalent in our QAnon dataset, even overshadowing English after 2020. Also, we find that content posted in German and Portuguese tends to be more toxic compared to English. Our topic modeling indicates that QAnon supporters discuss various topics of interest within far-right movements, including world politics, conspiracy theories, COVID-19, and the anti-vaccination movement. Taken all together, we perform the first multilingual study on QAnon through Telegram and paint a nuanced overview of the globalization of QAnon. ...

On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning

Conference paper (2023) - Yiting Qu, Xinlei He, Shannon Pierson, Michael Backes, Yang Zhang, Savvas Zannettou

The dissemination of hateful memes online has adverse effects on social media platforms and the real world. Detecting hateful memes is challenging, one of the reasons being the evolutionary nature of memes; new hateful memes can emerge by fusing hateful connotations with other cultural ideas or symbols. In this paper, we propose a framework that leverages multimodal contrastive learning models, in particular OpenAI's CLIP, to identify targets of hateful content and systematically investigate the evolution of hateful memes. We find that semantic regularities exist in CLIP-generated embeddings that describe semantic relationships within the same modality (images) or across modalities (images and text). Leveraging this property, we study how hateful memes are created by combining visual elements from multiple images or fusing textual information with a hateful image. We demonstrate the capabilities of our framework for analyzing the evolution of hateful memes by focusing on antisemitic memes, particularly the Happy Merchant meme. Using our framework on a dataset extracted from 4chan, we find 3.3K variants of the Happy Merchant meme, with some linked to specific countries, persons, or organizations. We envision that our framework can be used to aid human moderators by flagging new variants of hateful memes so that moderators can manually verify them and mitigate the problem of hateful content online. ...

Lambretta

Learning to Rank for Twitter Soft Moderation

Conference paper (2023) - Pujan Paudel, Jeremy Blackburn, Emiliano De Cristofaro, Savvas Zannettou, Gianluca Stringhini

To curb the problem of false information, social media platforms like Twitter started adding warning labels to content discussing debunked narratives, with the goal of providing more context to their audiences. Unfortunately, these labels are not applied uniformly and leave large amounts of false content unmoderated. This paper presents LAMBRETTA, a system that automatically identifies tweets that are candidates for soft moderation using Learning To Rank (LTR). We run Lambretta on Twitter data to moderate false claims related to the 2020 US Election and find that it flags over 20 times more tweets than Twitter, with only 3.93% false positives and 18.81% false negatives, outperforming alternative state-of-the-art methods based on keyword extraction and semantic search. Overall, LAMBRETTA assists human moderators in identifying and flagging false information on social media. ...

Why So Toxic?

Measuring and Triggering Toxic Behavior in Open-Domain Chatbots

Conference paper (2022) - Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Yang Zhang

Chatbots are used in many applications, e.g., automated agents, smart home assistants, interactive characters in online games, etc. Therefore, it is crucial to ensure they do not behave in undesired manners, providing offensive or toxic responses to users. This is not a trivial task as state-of-the-art chatbot models are trained on large, public datasets openly collected from the Internet. This paper presents a first-of-its-kind, large-scale measurement of toxicity in chatbots. We show that publicly available chatbots are prone to providing toxic responses when fed toxic queries. Even more worryingly, some non-toxic queries can trigger toxic responses too. We then set out to design and experiment with an attack, ToxicBuddy, which relies on fine-tuning GPT-2 to generate non-toxic queries that make chatbots respond in a toxic manner. Our extensive experimental evaluation demonstrates that our attack is effective against public chatbot models and outperforms manually-crafted malicious queries proposed by previous work. We also evaluate three defense mechanisms against ToxicBuddy, showing that they either reduce the attack performance at the cost of affecting the chatbot's utility or are only effective at mitigating a portion of the attack. This highlights the need for more research from the computer security and online safety communities to ensure that chatbot models do not hurt their users. Overall, we are confident that ToxicBuddy can be used as an auditing tool and that our work will pave the way toward designing more effective defenses for chatbot safety. ...

Effect of Social Networking on Real-World Events

Journal article (2022) - Jeremy Blackburn, Savvas Zannettou

The articles in this special issue focus on the emerging effects that social media can have on the real world. Social media has quickly become not just ubiquitous, but also integral to society. A large portion of social media's quick ascent was due to its modeling of real-world relationships, meaning the offline world informed the development and adoption of the online world. Recently, however, it has become apparent that this effect is not a one-way street. For example, the spread of mis- and disinformation, the spread of conspiracy theories, and the rise of extremism can all be attributed, in part, to social media. Most previous research has studied the real world and social media in isolation. However, these worlds are interconnected with each world having a substantial impact and influence on the other. For instance, hateful rhetoric disseminated via social media can encourage physical meetings that can quickly transform rhetoric into violent actions. Overall, it is of paramount importance, as a research community, to devote research resources into understanding and analyzing the exogenic effects of social media to the real world with the goal to further improves our lives. ...

TrollMagnifier

Detecting State-Sponsored Troll Accounts on Reddit

Conference paper (2022) - Mohammad Hammas Saeed, Shiza Ali, Jeremy Blackburn, Emiliano De Cristofaro, S. Zannettou, Gianluca Stringhini

Growing evidence points to recurring influence campaigns on social media, often sponsored by state actors aiming to manipulate public opinion on sensitive political topics. Typically, campaigns are performed through instrumented accounts, known as troll accounts; despite their prominence, however, little work has been done to detect these accounts in the wild. In this paper, we present TROLLMAGNIFIER, a detection system for troll accounts. Our key observation, based on analysis of known Russian-sponsored troll accounts identified by Reddit, is that they show loose coordination, often interacting with each other to further specific narratives. Therefore, troll accounts controlled by the same actor often show similarities that can be leveraged for detection. TROLLMAGNIFIER learns the typical behavior of known troll accounts and identifies more that behave similarly. We train TROLLMAGNIFIER on a set of 335 known troll accounts and run it on a large dataset of Reddit accounts. Our system identifies 1,248 potential troll accounts; we then provide a multi-faceted analysis to corroborate the correctness of our classification. In particular, 66% of the detected accounts show signs of being instrumented by malicious actors (e.g., they were created on the same exact day as a known troll, they have since been suspended by Reddit, etc.). They also discuss similar topics as the known troll accounts and exhibit temporal synchronization in their activity. Overall, we show that using TROLLMAGNIFIER, one can grow the initial knowledge of potential trolls provided by Reddit by over 300%. ...

"how over is it?" Understanding the Incel Community on YouTube

Journal article (2021) - Kostantinos Papadamou, Savvas Zannettou, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Michael Sirivianos

YouTube is by far the largest host of user-generated video content worldwide. Alas, the platform has also come under fire for hosting inappropriate, toxic, and hateful content. One community that has often been linked to sharing and publishing hateful and misogynistic content are the Involuntary Celibates (Incels), a loosely defined movement ostensibly focusing on men's issues. In this paper, we set out to analyze the Incel community on YouTube by focusing on this community's evolution over the last decade and understanding whether YouTube's recommendation algorithm steers users towards Incel-related videos. We collect videos shared on Incel communities within Reddit and perform a data-driven characterization of the content posted on YouTube. Among other things, we find that the Incel community on YouTube is getting traction and that, during the last decade, the number of Incel-related videos and comments rose substantially. We also find that users have a 6.3% chance of being suggested an Incel-related video by YouTube's recommendation algorithm within five hops when starting from a non Incel-related video. Overall, our findings paint an alarming picture of online radicalization: not only Incel activity is increasing over time, but platforms may also play an active role in steering users towards such extreme content. ...

"go eat a bat, chang!"

On the emergence of sinophobic behavior onweb communities in the face of COVID-19

Conference paper (2021) - Fatemeh Tahmasbi, Leonard Schild, Chen Ling, Jeremy Blackburn, Gianluca Stringhini, Yang Zhang, Savvas Zannettou

The outbreak of the COVID-19 pandemic has changed our lives in unprecedented ways. In the face of the projected catastrophic consequences, most countries have enacted social distancing measures in an attempt to limit the spread of the virus. Under these conditions, the Web has become an indispensable medium for information acquisition, communication, and entertainment. At the same time, unfortunately, the Web is being exploited for the dissemination of potentially harmful and disturbing content, such as the spread of conspiracy theories and hateful speech towards specific ethnic groups, in particular towards Chinese people and people of Asian descent since COVID-19 is believed to have originated from China. In this paper, we make a first attempt to study the emergence of Sinophobic behavior on the Web during the outbreak of the COVID-19 pandemic. We collect two large datasets from Twitter and 4chan's Politically Incorrect board (/pol/) over a time period of approximately five months and analyze them to investigate whether there is a rise or important differences with regard to the dissemination of Sinophobic content. We find that COVID-19 indeed drives the rise of Sinophobia on the Web and that the dissemination of Sinophobic content is a cross-platform phenomenon: it exists on fringe Web communities like /pol/, and to a lesser extent on mainstream ones like Twitter. Using word embeddings over time, we characterize the evolution of Sinophobic slurs on both Twitter and /pol/. Finally, we find interesting differences in the context in which words related to Chinese people are used on the Web before and after the COVID-19 outbreak: on Twitter we observe a shift towards blaming China for the situation, while on /pol/ we find a shift towards using more (and new) Sinophobic slurs. ...

Do Platform Migrations Compromise Content Moderation? Evidence from r/The_Donald and r/Incels

Journal article (2021) - Manoel Horta Ribeiro, Shagun Jhaver, Savvas Zannettou, Jeremy Blackburn, Gianluca Stringhini, Emiliano De Cristofaro, Robert West

When toxic online communities on mainstream platforms face moderation measures, such as bans, they may migrate to other platforms with laxer policies or set up their own dedicated websites. Previous work suggests that within mainstream platforms, community-level moderation is effective in mitigating the harm caused by the moderated communities. It is, however, unclear whether these results also hold when considering the broader Web ecosystem. Do toxic communities continue to grow in terms of their user base and activity on the new platforms? Do their members become more toxic and ideologically radicalized? In this paper, we report the results of a large-scale observational study of how problematic online communities progress following community-level moderation measures. We analyze data from r/The_Donald and r/Incels, two communities that were banned from Reddit and subsequently migrated to their own standalone websites. Our results suggest that, in both cases, moderation measures significantly decreased posting activity on the new platform, reducing the number of posts, active users, and newcomers. In spite of that, users in one of the studied communities (r/The_Donald) showed increases in signals associated with toxicity and radicalization, which justifies concerns that the reduction in activity may come at the expense of a more toxic and radical community. Overall, our results paint a nuanced portrait of the consequences of community-level moderation and can inform their design and deployment. ...

Dissecting the Meme Magic

Understanding Indicators of Virality in Image Memes

Journal article (2021) - Chen Ling, Ihab Abuhilal, Jeremy Blackburn, Emiliano De Cristofaro, Savvas Zannettou, Gianluca Stringhini

Despite the increasingly important role played by image memes, we do not yet have a solid understanding of the elements that might make a meme go viral on social media. In this paper, we investigate what visual elements distinguish image memes that are highly viral on social media from those that do not get re-shared, across three dimensions: composition, subjects, and target audience. Drawing from research in art theory, psychology, marketing, and neuroscience, we develop a codebook to characterize image memes, and use it to annotate a set of 100 image memes collected from 4chan's Politically Incorrect Board (/pol/). On the one hand, we find that highly viral memes are more likely to use a close-up scale, contain characters, and include positive or negative emotions. On the other hand, image memes that do not present a clear subject the viewer can focus attention on, or that include long text are not likely to be re-shared by users. We train machine learning models to distinguish between image memes that are likely to go viral and those that are unlikely to be re-shared, obtaining an AUC of 0.866 on our dataset. We also show that the indicators of virality identified by our model can help characterize the most viral memes posted on mainstream online social networks too, as our classifiers are able to predict 19 out of the 20 most popular image memes posted on Twitter and Reddit between 2016 and 2018. Overall, our analysis sheds light on what indicators characterize viral and non-viral visual content online, and set the basis for developing better techniques to create or moderate content that is more likely to catch the viewer's attention. ...