Dokument: Towards Emotionally Intelligent Task-oriented Dialogue Systems
| Titel: | Towards Emotionally Intelligent Task-oriented Dialogue Systems | |||||||
| URL für Lesezeichen: | https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=72702 | |||||||
| URN (NBN): | urn:nbn:de:hbz:061-20260330-101611-0 | |||||||
| Kollektion: | Dissertationen | |||||||
| Sprache: | Englisch | |||||||
| Dokumententyp: | Wissenschaftliche Abschlussarbeiten » Dissertation | |||||||
| Medientyp: | Text | |||||||
| Autor: | Feng, Shutong [Autor] | |||||||
| Dateien: |
| |||||||
| Beitragende: | Prof. Dr. Gasic, Milica [Gutachter] Prof. Dr. Higashinaka, Ryuichiro [Gutachter] | |||||||
| Dewey Dezimal-Klassifikation: | 000 Informatik, Informationswissenschaft, allgemeine Werke » 004 Datenverarbeitung; Informatik | |||||||
| Beschreibung: | The development of conversational agents has long been driven by the ambition to create systems that interact with humans in ways that are both functional and emotionally attuned. While task-oriented dialogue (ToD) systems are designed to achieve specific goals such as booking a restaurant or suggesting a tourist attraction, they typically overlook the emotional dimension of human communication. Yet, beyond transmitting factual information, interlocutors continually signal and interpret intentions through emotions. These signals influence error recovery and user satisfaction, all of which are indispensable in human communications and crucial to the system's functionality. The omission of emotions in ToD systems, while keeping the system centred on functional goals, limits user satisfaction and reduces robustness in real-world interactions.
In this thesis, we investigate how emotion can be integrated into ToD systems to enhance their effectiveness. The work unfolds through a sequence of contributions that move from the preparation of resources to the development of robust modelling methods, and finally to the incorporation of emotion into the full ToD pipeline. First, we address the scarcity of suitable resources by introducing \emph{EmoWOZ}, the first large-scale corpus of ToDs annotated with a dedicated user emotion taxonomy designed to capture the subtle affective behaviours unique to ToDs. EmoWOZ contains both human-human and human-machine conversations covering a wide spectrum of user emotions. Annotated with high-quality labels validated through rigorous quality control, EmoWOZ provides a foundation for studying user emotion in ToDs and supports downstream ToD modelling tasks. Second, we explore methods for modelling emotion in ToDs, pursuing two complementary directions. On the one hand, we develop lightweight supervised models by adapting chit-chat emotion recognition in conversations (ERC) models to the task-oriented setting. Our framework tackles the challenge from three angles - data, features, and objectives - through targeted data augmentation, the integration of task-specific and emotion-aware features, and novel optimisation objectives that account for relationships among emotion labels. On the other hand, we examine the potential of large language models (LLMs) as flexible, general-purpose emotion recognisers. We evaluate them across multiple dialogue settings, including ToDs, chit-chat, and psychological interviews, and test their effectiveness under low-resource scenarios and speech recognition errors. Together, these investigations establish a complementary toolkit for emotion recognition: specialised supervised models offer accuracy and efficiency in well-defined ToD settings, whereas LLMs provide flexibility and robustness when dealing with broader emotion taxonomies and less constrained dialogue scenarios. Finally, we integrate emotions into the full ToD pipeline and systematically investigate optimal design choices for emotionally intelligent ToD systems. In our first line of work, we focus on the practical challenges to infuse emotion-awareness into both modular and end-to-end ToD systems. This is achieved by extending the dialogue state to include user emotion, expanding the dialogue policy to include affective actions, and conditioning the natural language generation both semantically and emotionally. For the modular system, we further employ multi-objective online reinforcement learning (RL) to optimise task success and emotional appropriateness jointly. Evaluations with simulated and real users demonstrate that emotion-aware systems not only improve task performance but also enrich the user's emotional experience, confirming the practical benefits of incorporating emotion into ToDs. Building on these insights, our second line of work systematically explores a wider range of design considerations spanning architecture, representation, and optimisation. We present \emph{LUSTER} (\textbf{L}LM-based \textbf{U}nified \textbf{S}ystem for \textbf{T}ask-oriented dialogue with \textbf{E}nd-to-end \textbf{R}einforcement learning). LUSTER achieves significantly improved task efficiency and user satisfaction by leveraging LLM-based end-to-end architecture, fully lexicalised representations, and online multi-objective RL optimisation. Together, the advances discussed in this thesis demonstrate that emotion can be seamlessly integrated into ToD systems, not only enhancing their effectiveness and robustness but also bringing them closer to the goal of human-like, emotionally intelligent interaction. | |||||||
| Lizenz: | ![]() Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung 4.0 International Lizenz | |||||||
| Fachbereich / Einrichtung: | Mathematisch- Naturwissenschaftliche Fakultät » WE Informatik | |||||||
| Dokument erstellt am: | 30.03.2026 | |||||||
| Dateien geändert am: | 30.03.2026 | |||||||
| Promotionsantrag am: | 20.11.2025 | |||||||
| Datum der Promotion: | 04.02.2026 |

