Senior Principal Researcher, Perception and Interaction Group
Microsoft Research, Redmond, Washington, US
Situated Interaction
AbstractPhysically situated dialog is a complex, multimodal affair that goes well beyond the spoken word. When interacting with each other, people incrementally coordinate their actions to simultaneously resolve several different problems: they manage engagement, coordinate on taking turns, recognize intentions, and establish and maintain common ground as a basis for contributing to the conversation. A wide array of non-verbal signals are brought to bear. Proximity and body pose, attention and gaze, head nods and hand gestures, prosody and facial expressions, all play important roles in the intricate, mixed-initiative, fluidly coordinated process we call interaction. And just like a couple of decades ago advances in speech recognition opened up the field of spoken dialog systems, today advances in vision and other perceptual technologies are again opening up new horizons -- we are starting to be able to build machines that can understand these social signals and the physical world around them, and begin to participate in physically situated interactions and collaborations with people.
In this talk, using a number of research vignettes from my work, I will draw attention to some of the challenges and opportunities that lie ahead of us in this exciting space. In particular, I will discuss issues with managing engagement and turn-taking in multiparty open-world settings, and more generally highlight the importance of timing and fine-grained coordination in situated interaction. Finally, I will conclude by describing a framework that promises to simplify the development of physically situated interactive systems and enable more research and faster progress in this area.
BiographyDan Bohus is a Senior Principal Researcher in the Perception and Interaction Group at Microsoft Research. His work centers on the study and development of computational models for physically situated spoken language interaction and collaboration. The long term question that shapes his research agenda is how can we enable interactive systems to reason more deeply about their surroundings and seamlessly participate in open-world, multiparty dialog and collaboration with people? Prior to joining Microsoft Research, Dan obtained his Ph.D. from Carnegie Mellon University.
Professor, School of Informatics
University of Edinburgh, UK
Learning Natural Language Interfaces with Neural Models
AbstractIn Spike Jonze's futuristic film "Her", Theodore, a lonely writer, forms a strong emotional bond with Samantha, an operating system designed to meet his every need. Samantha can carry on seamless conversations with Theodore, exhibits a perfect command of language, and is able to take on complex tasks. She filters his emails for importance, allowing him to deal with information overload, she proactively arranges the publication of Theodore's letters, and is able to give advice using common sense and reasoning skills.
In this talk I will present an overview of recent progress on learning natural language interfaces which might not be as clever as Samantha but nevertheless allow uses to interact with various devices and services using every day language. I will address the structured prediction problem of mapping natural language utterances onto machine-interpretable representations and outline the various challenges it poses. For example, the fact that the translation of natural language to formal language is highly non-isomorphic, data for model training is scarce, and natural language can express the same information need in many different ways. I will describe a general modeling framework based on neural networks which tackles these challenges and improves the robustness of natural language interfaces.
BiographyMirella Lapata is professor of natural language processing in the School of Informatics at the University of Edinburgh. Her research focuses on getting computers to understand, reason with, and generate natural language. She is the first recipient (2009) of the British Computer Society and Information Retrieval Specialist Group (BCS/IRSG) Karen Sparck Jones award and a Fellow of the Royal Society of Edinburgh. She has also received best paper awards in leading NLP conferences and has served on the editorial boards of the Journal of Artificial Intelligence Research, the Transactions of the ACL, and Computational Linguistics. She was president of SIGDAT (the group that organizes EMNLP) in 2018.
Professor, Department of Systems Engineering and Engineering Management
Chinese University of Hong Kong, China
The Many Facets of Dialog
AbstractDialog is a most fascinating form of human communication. The back-and-forth exchanges convey the speaker's message to the listener, and the listener can derive information about the speaker's thoughts, intent, well-being, emotions and much more. This talk presents an overview of dialog research that concerns our group at The Chinese University of Hong Kong. In the domain of education and learning, we have been recording in-class student group discussions in the flipped-classroom setting of a freshman elite mathematics course. We investigate features in the weekly, within-group dialogs that may relate to class performance and learning efficacy. In the domain of e-commerce, we are developing dialog models based on approximately 20 million conversation turns, to support a virtual shopping assistant in customer inquiries and orders, logistics tracking, etc. In the domain of health and wellbeing, we are capturing and analysing dialogs between health professionals (or their virtual equivalent) and subjects in cognitive screening tests. We also conduct research in both semantic interpretation and dialog state tracking, as well as affective design of virtual conversational assistants. For the former, we have developed a Convex Polytopic Model for extracting a knowledge representation from user inputs in dialog turns by generating a compact convex polytope to enclose all the data points projected to a latent semantic space. The polytope vertices represent extracted semantic concepts. Each user input can then be "interpreted" as a sequence of polytope vertices which represent the user's goals and dialog states. For the latter, we have developed a multimodal, multi-task, deep learning framework to infer the user's emotive state and emotive state change simultaneously. This enables virtual conversational assistants to understand the emotive state in the user's input and to generate an appropriate emotive system response in the dialog turn, which will further influence the user's emotive state in the subsequent dialog turn. Such an affective design will be able to enhance user experience in conversational dialogs with intelligent virtual assistants.
BiographyHelen Meng is Patrick Huen Wing Ming Professor of Systems Engineering and Engineering Management at The Chinese University of Hong Kong (CUHK). She is the Founding Director of the CUHK Ministry of Education (MoE)-Microsoft Key Laboratory for Human-Centric Computing and Interface Technologies (since 2005), Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems (since 2006), and Co-Director of the Stanley Ho Big Data Decision Analytics Research Center (since 2013). Previously, she served as CUHK Faculty of Engineering's Associate Dean (Research), Chairman of the Department of Systems Engineering and Engineering Management, Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing, Member of the IEEE Signal Processing Society Board of Governors, ISCA Board Member and presently Member of the ISCA International Advisory Council. She was elected APSIPA's inaugural Distinguished Lecturer 2012-2013 and ISCA Distinguished Lecturer 2015-2016. Her awards include the Ministry of Education Higher Education Outstanding Scientific Research Output Award 2009, Hong Kong Computer Society's inaugural Outstanding ICT Woman Professional Award 2015, Microsoft Research Outstanding Collaborator Award 2016 (1 in 32 worldwide), IEEE ICME 2016 Best Paper Award, IBM Faculty Award 2016, HKPWE Outstanding Women Professionals and Entrepreneurs Award 2017 (1 in 20 since 1999), Hong Kong ICT Silver Award 2018 in Smart Inclusion, and the CogInfoComm2018 Best Paper Award. Helen received all her degrees from MIT. Her research interests include big data decision analytics, and artificial intelligence especially for speech and language technologies to support multilingual and multimodal human-computer interaction. Helen has given invited / keynote presentations including INTERSPEECH 2018 Plenary Talk, World Economic Forum Global Future Council 2018, Taihe Workshop on Building Stakeholder Networks on AI Ethics and Governance 2019 and the World Peace Forum 2019. She has served in numerous Government appointments, including Chairlady of the Research Grants Council's Assessment Panel for Competitive Research Funding Schemes for the Local Self-financing Degree Sector, Chairlady of the Working Party on Manpower Survey of the Information/Innovation Technology Sector (since 2013), as well as Steering Committee Member of Hong Kong's Electronic Health Record (eHR) Sharing. Helen is a Fellow of HKCS, HKIE, IEEE and ISCA.