A Two-tier User Simulation Model for Reinforcement Learning of Adaptive Referring Expression Generation Policies

Srinivasan Janarthanam and Oliver Lemon

SIGDIAL Workshop on Discourse and Dialogue (SIGDIAL 2009)
Queen Mary University of London, September 11-12, 2009


We present a new two-tier user simulation model for learning adaptive referring expression generation (REG) policies for spoken dialogue systems using reinforcement learning. Current user simulation models that are used for dialogue policy learning do not simulate users with different levels of domain expertise and are not responsive to referring expressions used by the system. The two-tier model displays these features, that are crucial to learning an adaptive REG policy. We also show that the two-tier model simulates real user behaviour more closely than other baseline models, using the Dialogue Similarity measure based on Kullback-Leibler divergence.