Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

Interspeech 2024

Publication date: September 5, 2024

Minh Van Nguyen, Franck Dernoncourt, David Seunghyun Yoon, Hanieh Deilamsalehy, Hao Tan, Ryan A. Rossi, Quan Hung Tran, Trung Bui, Thien Nguyen

In this paper, we introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, with existing studies primarily focusing on multimodal inputs and lacking large-scale, diverse datasets for effective model training. Addressing these gaps, we present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names. Through extensive experimentation, our best model achieves a great precision of 80.3%, setting a new benchmark for text-based SpeakerID tasks. We will make the data and code publicly available upon the release of the paper.