Decoding Legislative Discourse: Transformer-Based Topic Modeling of U.S. Congressional Hearings
A Comparative Analysis of Standard and Zero-Shot BERTopic
N.R. Rebecca (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S. Tan – Mentor (TU Delft - Interactive Intelligence)
E. Salas Gironés – Mentor (TU Delft - Interactive Intelligence)
O.E. Scharenborg – Graduation committee member (TU Delft - Multimedia Computing)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Congressional hearings are at the center of legislation, yet their analysis is hindered by the volume and complexity of the transcripts. While recent advances in Natural Language Processing (NLP) have enabled political discourse analysis using automated tools, conventional topic modeling methods often struggle to produce semantically coherent topics due to their reliance on context-free word frequencies. This paper evaluates the performance of a new transformer-based topic modeling technique, focusing on its application to policy discussions through a detailed case study. Two variants of BERTopic are considered: (1) a parameter-tuned model and (2) a zero-shot variant, evaluated on U.S. congressional hearing transcripts from 2021 to 2024. The results demonstrate that the zero-shot version achieves competitive coherence with increased interpretability and stability, making it a useful resource for policymakers and researchers alike. This paper establishes a foundational methodological framework for automated legislative text analysis. It also outlines the trade-offs between unsupervised and semi-supervised topic modeling in political usage.