This paper addresses the problem of how to determine which differences between two versions of a system affect which its behavior, and how. For this we propose two algorithms, the simpler one is just a scaffold for the Join Evaluation and Differences Identification (JEDI) algorithm. JEDI finds differences between systems which relate to performance by formulating the problem as a multi-task feature selection question. JEDI provides evidence on the usefulness of a recent method, based on L1/L2 regularatization (Obozinski et al, 2006). We evaluate our approaches against manually annotated success criteria from real users interacting with five different spoken user interfaces that give bus information.