Repository hosted by TU Delft Library

Home · Contact · About · Disclaimer ·

Progress in the AMIDA speaker diarization system for meeting data

Publication files not online:

Author: Leeuwen, D.A. van · Konečný, M.
Institution: TNO Defensie en Veiligheid
Source:2nd Annual Classifcation of Events Activities and Relationships, CLEAR 2007 and Rich Transcription, RT 2007, 8 May 2007 through 11 May 2007, Baltimore, MD, Conference code: 72688, 4625 LNCS, 475-483
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Identifier: 240920
doi: doi:10.1007/978-3-540-68585-2_44
Keywords: Acoustics and Audiology · Error analysis · Transcription · Cluster merging · Error Rate (ER) · Heidelberg (CO) · International (CO) · Multi-modal · Speaker diarization · Speech Activity Detection (SAD) · Tunable parameters · Speech


In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as little tunable parameters as possible, while maintaining performance. The system consists of a BIC segmentation/clustering initialization, followed by a combined re-segmentation cluster merging algorithm. The Diarization Error Rate (DER) result of our best system is 17.0 %, accounting for overlapping speech. However, we find that a slight altering of Speech Activity Detection models has a large impact on the speaker DER. © 2008 Springer-Verlag Berlin Heidelberg.