Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Interspeech 2023

Publication date: August 24, 2023

Viet Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Hung Tran, David Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Nguyen

Punctuation restoration is an important task in automatic speech recognition (ASR). It restores the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant in written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap. The experiments show that our method achieves state-of-the-art performance on two important datasets for punctuation restoration.