The Algorithmic Self-Portrait

Deconstructing Memory in ChatGPT

Conference Paper (2026)
Author(s)

Abhisek Dash (Max Planck Institute for Software Systems)

Soumi Das (Max Planck Institute for Software Systems)

Elisabeth Kirsten (Ruhr-Universität Bochum)

Qinyuan Wu (Max Planck Institute for Software Systems)

Sai Keerthana Karnam (Indian Institute of Technology Kharagpur)

Krishna P. Gummadi (Max Planck Institute for Software Systems)

Thorsten Holz (Max Planck Institute for Security and Privacy)

Muhammad Bilal Zafar (Ruhr-Universität Bochum)

Savvas Zannettou (TU Delft - Technology, Policy and Management)

Research Group
Organisation & Governance
DOI related publication
https://doi.org/10.1145/3774904.3792671 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Organisation & Governance
Pages (from-to)
3471-3482
Publisher
ACM
ISBN (electronic)
9798400723070
Event
35th ACM Web Conference, WWW 2026 (2026-06-29 - 2026-07-03), Dubai, United Arab Emirates
Downloads counter
17
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

To enable personalized and context-aware interactions, conversational AI systems have introduced a new mechanism: Memory. Memory creates what we refer to as the Algorithmic Self-portrait - -a new form of personalization derived from users' self-disclosed information divulged within private conversations. While memory enables more coherent exchanges, the underlying processes of memory creation remain opaque, raising critical questions about data sensitivity, user agency, and the fidelity of the resulting portrait. To bridge this research gap, we analyze 2,050 memory entries from 80 real-world ChatGPT users. Our analyses reveal three key findings: (1) a striking 96% of memories in our dataset are created unilaterally by the conversational system, potentially shifting agency away from the user; (2) Memories, in our dataset, contain a rich mix of GDPR-defined personal data (in 28% memories) along with psychological insights about participants (in 52% memories); and (3) A significant majority of the memories (84%) are directly grounded in user context, indicating faithful representation of the conversations. Finally, we introduce a framework - - Attribution Shield - -that anticipates these inferences, alerts about potentially sensitive memory inferences, and suggests query reformulations to protect personal information without sacrificing utility.