The Algorithmic Self-Portrait
Deconstructing Memory in ChatGPT
Abhisek Dash (Max Planck Institute for Software Systems)
Soumi Das (Max Planck Institute for Software Systems)
Elisabeth Kirsten (Ruhr-Universität Bochum)
Qinyuan Wu (Max Planck Institute for Software Systems)
Sai Keerthana Karnam (Indian Institute of Technology Kharagpur)
Krishna P. Gummadi (Max Planck Institute for Software Systems)
Thorsten Holz (Max Planck Institute for Security and Privacy)
Muhammad Bilal Zafar (Ruhr-Universität Bochum)
Savvas Zannettou (TU Delft - Technology, Policy and Management)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
To enable personalized and context-aware interactions, conversational AI systems have introduced a new mechanism: Memory. Memory creates what we refer to as the Algorithmic Self-portrait - -a new form of personalization derived from users' self-disclosed information divulged within private conversations. While memory enables more coherent exchanges, the underlying processes of memory creation remain opaque, raising critical questions about data sensitivity, user agency, and the fidelity of the resulting portrait. To bridge this research gap, we analyze 2,050 memory entries from 80 real-world ChatGPT users. Our analyses reveal three key findings: (1) a striking 96% of memories in our dataset are created unilaterally by the conversational system, potentially shifting agency away from the user; (2) Memories, in our dataset, contain a rich mix of GDPR-defined personal data (in 28% memories) along with psychological insights about participants (in 52% memories); and (3) A significant majority of the memories (84%) are directly grounded in user context, indicating faithful representation of the conversations. Finally, we introduce a framework - - Attribution Shield - -that anticipates these inferences, alerts about potentially sensitive memory inferences, and suggests query reformulations to protect personal information without sacrificing utility.