Decoding Legislative Discourse: Transformer-Based Topic Modeling of U.S. Congressional Hearings

None, None

Decoding Legislative Discourse: Transformer-Based Topic Modeling of U.S. Congressional Hearings

A Comparative Analysis of Standard and Zero-Shot BERTopic

Bachelor Thesis (2025)

Author(s)

N.R. Rebecca (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Tan – Mentor (TU Delft - Interactive Intelligence)

E. Salas Gironés – Mentor (TU Delft - Interactive Intelligence)

O.E. Scharenborg – Graduation committee member (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Topic modeling Bertopic Us congressional hearings

To reference this document use:

https://resolver.tudelft.nl/uuid:5b8a485d-5efd-49fd-9789-4a199d22d301

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

24-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Congressional hearings are at the center of legislation, yet their analysis is hindered by the volume and complexity of the transcripts. While recent advances in Natural Language Processing (NLP) have enabled political discourse analysis using automated tools, conventional topic modeling methods often struggle to produce semantically coherent topics due to their reliance on context-free word frequencies. This paper evaluates the performance of a new transformer-based topic modeling technique, focusing on its application to policy discussions through a detailed case study. Two variants of BERTopic are considered: (1) a parameter-tuned model and (2) a zero-shot variant, evaluated on U.S. congressional hearing transcripts from 2021 to 2024. The results demonstrate that the zero-shot version achieves competitive coherence with increased interpretability and stability, making it a useful resource for policymakers and researchers alike. This paper establishes a foundational methodological framework for automated legislative text analysis. It also outlines the trade-offs between unsupervised and semi-supervised topic modeling in political usage.

Files

Research_Paper.pdf

(pdf | 4.25 Mb)

License info not available