Congressional hearings are at the center of legislation, yet their analysis is hindered by the volume and complexity of the transcripts. While recent advances in Natural Language Processing (NLP) have enabled political discourse analysis using automated tools, conventional topic
...
Congressional hearings are at the center of legislation, yet their analysis is hindered by the volume and complexity of the transcripts. While recent advances in Natural Language Processing (NLP) have enabled political discourse analysis using automated tools, conventional topic modeling methods often struggle to produce semantically coherent topics due to their reliance on context-free word frequencies. This paper evaluates the performance of a new transformer-based topic modeling technique, focusing on its application to policy discussions through a detailed case study. Two variants of BERTopic are considered: (1) a parameter-tuned model and (2) a zero-shot variant, evaluated on U.S. congressional hearing transcripts from 2021 to 2024. The results demonstrate that the zero-shot version achieves competitive coherence with increased interpretability and stability, making it a useful resource for policymakers and researchers alike. This paper establishes a foundational methodological framework for automated legislative text analysis. It also outlines the trade-offs between unsupervised and semi-supervised topic modeling in political usage.