Abstract
Task-oriented dialogue systems (TODS) have existed for a few decades in the form of virtual assistants such as Amazon Alexa, Apple Siri, Microsoft XiaoIce, and others. Traditional approaches adopted neural networks for a modular approach, incorporating various components that harness a wide range of Natural Language Processing (NLP) capabilities: intent classification of the user’s query as Natural Language Understanding (NLU), Dialogue State Tracking (DST) to handle a global semantic state of the dialogue, and response generation as Natural Language Generation (NLG), while the more recent approaches are steering towards end-to-end systems by only leveraging a single module to do the complete TODS. As such, this task is notably difficult to tackle in a zero-shot scenario, where so many different capabilities and domain knowledge are required for the system to accomplish its goal.Recently, with the emergence of large language models, the field of NLP has witnessed a strong paradigm shift in the way many existing tasks are tackled. The scalability of transformer models along with the progress in computing hardware enabled language models to scale up to billions and even hundreds of billions of parameters. These large language models undergo extensive pre-training and acquire an enormous amount of general knowledge, to the point that many tasks which previously required task-specific data can now be tackled with little to no additional data, such as question answering or summarization. Along with that, researchers have very recently noticed the apparition of emergent abilities in large language models exceeding a certain scale which was not anticipated beforehand. Although the exploration of large language models is still in its early stages, there is a need to investigate more complex and composite tasks, such as task-oriented dialogue. Many people using large language models wrongly believes that these models are all-powerful and can easily perform any given tasks, without being aware of the hidden drawbacks behind these direct interactions such as hallucination and unfaithful information in the context of task-oriented dialogue.
In this thesis, we investigate the potential of instruction-tuned large language models (LLMs) to perform end-to-end task-oriented dialogue systems in a zero-shot scenario, meaning without model parameter updates and with no additional task-specific and no domain-specific data. We propose InstructTODS, the first framework to efficiently leverage these models to perform end-to-end task-oriented dialogue. Through our investigation, we show that InstructTODS manages to perform on par with the state-of-the-art fine-tuned TODS baselines, all the while removing the resource and training requirements as well as being adaptable across any domains and tasks.
| Date of Award | 2023 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Pascale Ngan FUNG (Supervisor) |
Cite this
- Standard