k-Nearest Neighbor Monte-Carlo Control Algorithm for POMDP-based Dialogue Systems

Fabrice Lefèvre, Milica Gasic, Filip Jurcicek, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu and Steve Young

SIGDIAL Workshop on Discourse and Dialogue (SIGDIAL 2009)
Queen Mary University of London, September 11-12, 2009

Summary

In real-world applications, modelling dialogue as a POMDP requires the use of a summary space for the dialogue state representation to ensure tractability. Sub-optimal estimation of the value function governing the selection of system responses can then be obtained using a grid-based approach on the belief space. In this work, the Monte-Carlo control technique is extended so as to reduce training over-fitting and to improve robustness to semantic noise in the user input. This technique uses a database of belief vector prototypes to choose the optimal system action. A locally weighted $k$-nearest neighbor scheme is introduced to smooth the decision process by interpolating the value function, resulting in higher user simulation performance.