Task-Completion Dialogue Policy Learning via Monte Carlo Tree Search with Dueling Network

Sihan Wang; Kaijie Zhou; Kunfeng Lai; Jianping Shen

Task-Completion Dialogue Policy Learning via Monte Carlo Tree Search with Dueling Network

Sihan Wang, Kaijie Zhou, Kunfeng Lai, Jianping Shen

Abstract Paper Connected Papers Add to Favorites

Dialog and Interactive Systems Long Paper

Gather-2B: Nov 17, Gather-2B: Nov 17 (10:00-12:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in a separate window.

Abstract: We introduce a framework of Monte Carlo Tree Search with Double-q Dueling network (MCTS-DDU) for task-completion dialogue policy learning. Different from the previous deep model-based reinforcement learning methods, which uses background planning and may suffer from low-quality simulated experiences, MCTS-DDU performs decision-time planning based on dialogue state search trees built by Monte Carlo simulations and is robust to the simulation errors. Such idea arises naturally in human behaviors, e.g. predicting others' responses and then deciding our own actions. In the simulated movie-ticket booking task, our method outperforms the background planning approaches significantly. We demonstrate the effectiveness of MCTS and the dueling network in detailed ablation studies, and also compare the performance upper bounds of these two planning methods.

NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EMNLP2020