Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers

Milica Gasic,  Filip Jurcicek,  Simon Keizer,  Francois Mairesse,  Blaise Thomson,  Kai Yu,  Steve Young
University of Cambridge


Gaussian Processes provide a non-parametric Bayesian approach for function approximation. They enable the incorporation of prior knowledge about the correlations of function values through the choice of a kernel function. This allows the variance of the posterior to be estimated, thus modelling the uncertainty of the approximation. They have been successfully used for Value function estimation in Reinforcement learning for Markov Decision Processes (MDP). Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables an optimal dialogue policy, robust to speech recognition errors, to be learnt. However, a major challenge in POMDP policy learning is to maintain tractability and the use of approximation is inevitable. This then creates the problem of estimating the quality of the approximation. We propose applying Gaussian Processes in Reinforcement learning of optimal POMDP dialogue policies, firstly, to make the learning process faster and, secondly, to obtain an estimate of the uncertainty of the approximation. We demonstrate the idea on a simple voice mail dialogue task and show how an adequate kernel function can speed up the learning process. We then apply this technique to a real-world tourist information dialogue task.