Automating user experience via natural language utterances, such as question answering, while offering realistic user interfaces is a nontrivial and open problem. Traditionally, the problem is classified into two sub-problems based on the nature of the data source, each with a unique challenge – structured data, where the input is high dimensional (ex. SQL queries), and unstructured data, where the input is noisy (ex. wikipedia texts). In both cases, previous work focused on designing fully automated machine learning systems in closed loops with humans are out of the loop and the data is already accessible. These approaches fail when the underlying data source is not available or becomes highly dynamic (ex. web pages).
In the first part of my defense, I will introduce how we can build natural language interfaces on traditional structured data sources, such as relational databases and wikipedia documents. I will present our deep learning models that can leverage large sources of labeled and unlabeled data to generalize to unseen scenarios. When the underlying database or API is not accessible, as in surfing web pages, we need new approaches that can learn from more abstract interfaces. In the second part, I will focus on our more recent work of using web interfaces to accomplish certain tasks, such as movie ticket booking, without accessing the underlying API or data. I will discuss how we train deep reinforcement learning policies that can navigate web pages while conversing with users to collect more information over multiple turns.