In this paper we propose a new technique to enhance emotion recognition by combining in different ways what we call emotion predictions. The technique is called F2 as the combination is based on a double fusion process. The input to the first fusion phase is the output of a number of classifiers which deal with different types of information regarding each sentence uttered by the user. The output of this process is the input to the second fusion stage, which provides as output the most likely emotional category. Experiments have been carried out using a previously-developed spoken dialogue system designed for the fast food domain. Results obtained considering three and two emotional categories show that our technique outperforms the standard single fusion by 2.25% and 3.35% absolute, respectively.