Enabling humans to use natural language to interact with computers towards achieving certain goals such accessing a factual information, booking a flight or reserving a table at a restaurant, holding a casual/intellectual conversation with AI (ultimately) has been among the central goals of both NLP and AI research. Many people use natural language interfaces such as Siri, Google Assistant, Alexa, and Cortana in their daily life. In addition, there are many more equally promising and impactful, but relatively less explored verticals such as health, real estate, customer service, etc. Although several promising results have been achieved in both academia and industry, there are still many research problems to tackle towards realizing such natural language interfaces in full fledge such as improving their reliability (accuracy), understanding the kinds of capabilities required from systems along with their bottlenecks on benchmark datasets, and producing informative and engaging enough responses for users in conversational settings.
In this talk, I will first lay out the range of potential applications of natural language interfaces along with the central challenges they induce under the following two settings: (1) Factual (single-turn) response generation using structured knowledge, and (2) Conversational (multi-turn) response generation using unstructured/structured knowledge. For the first setting, I will discuss how our proposed general approach for answer type inference of factual queries over a knowledge base can help improve the performance of existing machine learning systems. For the second setting, I will first introduce a principled neural architecture that can hold casual conversation with humans. This approach extends pointer-generator networks by allowing the decoder to hierarchically attend and copy from external unstructured knowledge to make the conversation more engaging and informative for users. Then, I will introduce our most recent work on multi-domain task oriented conversations that can address more essential human needs (e.g. booking a hotel and reserving a table at a nearby restaurant in a single conversation) compared to the previous model. I will discuss our proposed approach, neural assistant, a single neural model that jointly generates both the action to be taken by the system and the text response for users through latent knowledge reasoning over relational databases without any intermediate symbolic query execution. I will conclude the talk by discussing further research frontiers such as building a single end-to-end model for task-oriented conversations that can learn to reason over more heterogenous knowledge sources (e.g. a relational database accompanied by a set of relevant unstructured text snippets).