On NoMatchs, NoInputs and BargeIns: Do Non-Acoustic Features Support Anger Detection?

Alexander Schmitt, Tobias Heinroth and Jackson Liscombe

SIGDIAL Workshop on Discourse and Dialogue (SIGDIAL 2009)
Queen Mary University of London, September 11-12, 2009


Most studies on speech-based emotion recognition are based on prosodic and acoustic features, only employing artificial acted corpora where the results cannot be generalized to telephone-based speech applications. In contrast, we present an approach based on utterances from 1911 calls from a deployed telephone-based speech application, taking advantage of additional dialogue features, NLU features and ASR features that are incorporated into the emotion recognition process. Depending on the task, non-acoustic features add 3.41% in classification accuracy compared to using only acoustic features.