Modeling Dialogue Structures

Rebecca D Rees and Dr. Deryle Lonsdale, Linguistics

Within every human being lies the desire and the power to communicate—both to decide what to say and when to say it, and to unravel the meaning in someone else’s message. And dialogue is our fundamental tool of communication. Even a writer must take into account their intended audience and the most successful way to convey their “monologue”; my paper, for example, even if put away on some forgotten library shelf, will complete the communicative exchange (which I began when I started writing) the moment a reader picks it up, dusts it off, and opens the cover. The message will get out.

As people try to develop techniques for dealing with dialogue in computer systems, their understanding of the way humans approach dialogue influences their perceptions of the problem. Theorists have identified issues in dialogue such as turn-taking, implicature, adjacency pairs, grounding acts, and subdialogues. Early systems exploring the function of dialogue used simple regular expression matching. Later systems tried to incorporate explanations of dialogue structure and modeling such as dialogue grammars, illocutionary acts, belief-desire-intention models, dialogue plans, and joint action. Their developers dealt with questions of initiative as they sought to include more user initiative than system initiative, in their struggle to develop purely mixed-initiative systems. This project explored the functionality of several different approaches to managing dialogue.

Using a finite-state automaton (FSA) model last summer at USC’s Information Sciences Institute (ISI) gave my dialogue system one omniscient control system, excusing the system from having to describe the worldview or why each state occurred. This meant that the system listened for particular words to indicate which preprogrammed response matched the human user’s last utterance. This type of dialogue control is limited to the imagination of the programmer, since the programmer must not only create the computer’s responses, but also specify which words the system is spot-checking for. Though it was a rigid structure, with careful wording and state choices, it created a reasonable approximation of the Hollywood script this dialogue came from.

In an attempt to actually explain the behavior of dialogue systems, Göteborg University’s Trindi architecture provides a belief-desire-intention (BDI) model of dialogue. Working with the PSST (Pedagogical Software and Speech Technology) and the Soar research groups on campus, we created several new dialogue scenarios to integrate different technologies and capabilities together—including the Trindi dialogue management system. While other students focused on integrating other resources into our systems, like the Web, these projects allowed me to build and explore a dialogue system using the joint action theory TrindiKit offered.

Our first scenario, the Set-a-Date program, was created to help a “needy” student plan a social date at Brigham Young University. Parsing the BYU homepage for calendar and activity information, fellow students built a database of potential activities and helpful information associated with those campus events: event type, event title, date, time, place (building and room), and price. The dialogue engine I constructed interacted with the user to find out what calendar date the participant preferred, allowing the user to take control of the conversation and reorder the flow of information. The system then used that information to consult the database we had created and generate an appropriate match. The Set-a-Date engine asked the user if the event type was acceptable, and informed the user how much the event cost, where it would be, and when it would start. The structure to the dialogue and the database knowledge proved most suitable to the information parsed from the BYU calendar.

The second scenario allowed the user to play a genealogy quiz game on our GedQuiz engine. Building a database from a gedcom file, we took advantage of Prolog inferencing and real world knowledge to establish rules showing the relationships between different individuals in the database. This way, the system could ask more intelligent questions during the game: “Who was born on Christmas?”, “Which of your relatives immigrated to the United States before they died?”, or “Who was buried in a different place than where they died?” Using the Trindi architecture, I created new ways to ask questions—fill-in-the-blank, true/false, yes/no, and multiple choice questions. The idea behind this game was to find a way to appeal to the new generation’s interest. The game uses Prolog inferencing capabilities to provide a higher level of understanding of our own genealogy than is normally possible; it provides a new way to look at and interact with our history; it provides context to those exhaustive lists of names and dates; it steps beyond the preprogrammed FSA response.

The Trindi architecture allowed the system to model the agent’s beliefs and goals, as well as its perception of the user’s beliefs and goals in an information state (IS). A dialogue move engine (DME) updates the IS and selects new dialogue moves depending on the context of the user’s utterance, the system’s goals, and the conversation model. Dialogue plans and the system’s implementation in Prolog allow it to accommodate the user at any point in the conversation. Expanding upon the GoDiS system improved these abilities to allow different types of questions and new methods for accessing the database. However, the simple interpretation and generation capabilities produce more canned responses than intelligent decisions, and dialogue plans must be hard-coded for plan recognition (which is an intractable problem, anyway).

Another approach to controlling conversations, NL-Soar, illustrates an interesting addition to dialogue management systems. Its cognitive architecture allows the system to learn from its own experiences, creating discourse recipes from the plans it’s learned. Once NL-Soar’s dialogue component has acquired a recipe, it can take advantage of that both in generation and in comprehension. It maintains a shared background knowledge area and a conversational record in its attempt to model the user’s goals and keep track of its own. Much work still needs to be done on this system, but its potential as a dialogue management system is promising.

Each type of system—FSA, BDI, and NL-Soar—has its own advantages and serves to teach us new concepts to explain how humans communicate. Integrating insights from each different approach would allow a broader, more useful, more robust control system, benefiting from the successes of each system. No system presumes to suggest that it has all the answers to the many unsolved questions in dialogue processing. NL-Soar’s potential for managing dialogue, though, with its model for the human approach to language, seems to offer the most intriguing solution. If the way we learn and represent knowledge works for us, why not for a system modeling us?