Recent works have shown that generative data augmentation, where synthetic samples generated from deep generative models are used to augment the training dataset, benefit certain NLP tasks. In this work, we extend this approach to the task of dialog state tracking for goal-oriented dialogs. Since, goal-oriented dialogs naturally exhibit a hierarchical structure over utterances and related annotations, deep generative data augmentation for the task requires the generative model to be aware of the coherence among multiple types of dialog information. We propose the Variational Hierarchical Dialog Autoencoder (VHDA) for modeling the complete aspects of goal-oriented dialogs, including linguistic features and underlying structured annotations, namely speaker information, dialog acts and goals. The proposed architecture is designed to model each aspect of goal-oriented dialogs using interconnected latent variables and learns to generate coherent goal-oriented dialogs from the latent spaces. To overcome training issues that arise from training complex variational models, we propose appropriate training strategies. Experiments show that our model is able to improve the robustness of downstream dialog state trackers through generative data augmentation on various dialog datasets. We also discover the additional benefits of our unified approach to modeling goal-oriented dialogs: the ability to generate more coherent and diverse utterances and user actions, outperforming previous strong baselines in related tasks.
Learn More