Multimodal Dialogue System
Project Overview
Left: MuDiS HRI experiment at the JAHIR demonstrator Center: Christoph Mayer (I9) working with the Mitsubishi robot. Right: Workspace Photos: MK
The main goal of MuDiS is to develop a Multimodal Dialogue System that can be adapted quickly to a wide range of various scenarios. In this interdisciplinary project, we unite researchers from diverse areas, including computational linguistics, computer science, electrical engineering, and psychology. The different research lines of MuDiS reflect the interdisciplinary character of the project. On the one hand, MuDiS collects data from human-human experiments to get new insights in multimodal human interaction. On the other hand, the project develops a new system architecture for multimodal systems, which also contains new components for emotion recognition, multimodal fusion, and dialogue management.
A cornerstone of project MuDiS was a system designed to conduct a multimodal conversation and show some cognitive capabilities while -- at the same time -- passing commands to robots and receiving data. We developed a system that learns of procedures, structure of items and related conditions, purpose of actions (respectively cause and effect) simply by explaining in natural language (speech and text). To proof its understanding of spoken input, it visualizes gained knowledge in a graphical way and answers related questions.
Moreover it has rudimentary skills to infer on the knowledge gained. Beyond that it has broadly based interfacing capabilities to other systems so that it is an handy interfacing platform to quickly pass speech commands to machines or software components, e.g. Nintendo Wiimote, Ugobe Pleo, WowWee flying model, AMSTracker in MacBooks or even reads RSS-feeds from news sites. Naturally it is to integrate virtual effortlessly in CoTeSys demonstrators including JAHIR.
Future plans are to make a real multimodal and cognitive dialog, so that the dialogue system will have rudimentary abilities of self reflection. The system will always show rational behavior and is ready to account for its behavior at any time for it knows why the system acted as it acted.
Pro active assistive functions in a sense that it classifies actions together with events and reads human intentions (intention detection) will support the user by anticipating human behavior.
Future versions of the software will include modules to understand, induce and talk about emotions. Another goal is to integrate modules equipping the system with an aptitude for self-awareness, so that it recognizes itself as part of the environment and is able to communicate accordingly.
Videos
Left: MuDiS Interaction Experiment 2009 (staged for illustrational purposes) Middle,Right: Controlling a wireless Pleo Robot with Wiimote-gestures and speech
The video on the left illustrates our experiment in 2009. In a common Human-Machine setup, we tried to induce emotions at our subjects. Data was aquired by several cameras, a microphone, sensors for both skin resistance and heart rate. The experiment itself consists of two major parts. In the introductory part the test persons were asked to name size and color of lego-bricks and to explain how to build a pyramid from bricks, whereas in the second phase the proband and the robot had to built a structure together. Emotions were induced in the second phase.
The second and the third video showcase one of our multimodal cognitive dialogue systems. The demo shows how it passes speech and gesture commands to an autonomous system.
People
- Manuel Giuliani, M.Sc.
- Dipl.-Inform. Michael Kaßecker
Partners
- Image Understanding & Knowledge-Based Systems
- Human-Machine Communication, Department of Electrical Engineering and Information Technologies
- Institut für Arbeitswissenschaft, Universität der Bundeswehr
Acknowledgement
This ongoing work is part of the cluster of excellence CoTeSys. The project is closely related to the other CoTeSys projects in our group, especially BAJA, CogArch and JAHIR.Downloads
Publications
[1] | Markus Rickert, Michael Kaßecker, and Alois Knoll. Aufgabenbeschreibung mit verhaltensbasierter Robotersteuerung und natürlicher Kommunikation. Technical Report TUM-I0914, Technische Universität München, Munich, Germany, 2009. [ .bib | .pdf ] |
[2] | Manuel Giuliani, Michael Kaßecker, Stefan Schwärzler, Alexander Bannat, Jürgen Gast, Frank Wallhoff, Christoph Mayer, Matthias Wimmer, Cornelia Wendt, and Sabrina Schmidt. Mudis - a multimodal dialogue system for human-robot interaction. In Proceedings of the International Workshop on Cognition for Technical Systems, Munich, Germany, 2008. [ .bib | .pdf ] |
[3] | Frank Wallhoff, Jürgen Gast, Alexander Bannat, Stefan Schwärzler, Gerhard Rigoll, Cornelia Wendt, Sabrina Schmidt, Michael Popp, and Berthold Färber. Real-time framework for on- and off-line multimodal human-human and human-robot interaction. In Proceedings of the International Workshop on Cognition for Technical Systems, Munich, Germany, 2008. [ .bib | .pdf ] |