Task-Centered Conversation Systems for Computer-Assisted Language Learning

Aric Bills and Dr. Deryle Lonsdale, Linguistics and English Language

Many language educators have envisioned computer systems capable of providing language learners with intelligent spoken interaction in the target language. Although we can’t expect to see a fully conversant computer for years to come, there are several resources available which can be used to make scaled-down, domain-restricted, speech-driven conversational computer systems from which language learners can benefit. Many of these resources are free for academic use and can be used on a personal computer. My project explores new ways to combine these resources to create useful task-centered conversational systems.

For my project, I created two conversation systems. The first involves a twenty-questions-style dialogue where the user picks an animal and answers the computer’s questions as the computer tries to guess what the user is thinking of. The second system leads a surface-level discussion on the user’s home country, covering topics including major industries, bordering nations, and climate.

The backbone for the systems is a group of programs collectively known as the CSLU Speech Toolkit, which was developed by a consortium of researchers at several different institutions.ⁱ The systems were built using the Toolkit’s Rapid Application Developer as well as a CSLUenhanced version of the scripting language Tcl. The toolkit controls an animated agent, Baldi, who is programmed to model the movements of the human articulators (particularly the lips, mouth, tongue, and teeth). Baldi’s movements are synchronized with a sound signal generated by the Festival text-to-speech engine, giving the appearance that Baldi is talking. The toolkit can pause and listen to user responses spoken into a microphone, and can recognize from a list of expected responses what the user said.

The CSLU toolkit has two limitations which hinder the development of a conversation system: first, it can’t accurately recognize long utterances; and second, it can only recognize utterances which it anticipates, i.e. ones which have been “pre-programmed” into the system. To make my project feasible, I attempted to create questions which would elicit short responses within a specific domain.

The greatest obstacles were obtaining enough knowledge to allow for a meaningful conversation, and making that knowledge usable to the system. I explored the possibility of using several different resources and eventually selected Princeton’s WordNet, the CIA World Factbook, and the zoo database from UC Irvine’s Machine Learning Repository.^{ii iii iv} I used this knowledge 1) to help the system determine what comments to make and what questions to ask, and 2) to help the system anticipate the user’s responses.

To make the knowledge usable, I converted the zoo database into a database of Prolog assertions, and I created a structured index of the CIA World Factbook. WordNet was already machine readable and no changes were necessary.

The first system uses the information in the zoo database. A Tcl process running inside the CSLU toolkit asks questions about specific attributes in the database. As the user answers each question, a Prolog process gives the toolkit a list of animals that meet the user’s criteria. When Prolog returns a single animal, the Tcl process asks the user if that animal is the animal he/she had in mind. For this system, I relied on Prolog to help the system determine how many questions to ask and when to make a guess about what animal the user was thinking of. I developed client/server code in Tcl and Prolog to allow the two programs to run in parallel and to communicate via sockets. Such a link could be used more extensively in the future to take advantage of Prolog’s forward inferencing capabilities. For example, Prolog might be used to help determine which of a set of questions might be most relevant for a given situation.

The second system draws heavily on information from the CIA World Factbook and WordNet. The system begins by asking users where they are from and, depending on the responses of the user and knowledge the system can access, may ask questions or make comments about major industries in the country, bordering nations, and the country’s climate. To make this system work, it was essential to be able to program the recognizer dynamically, so that the toolkit would anticipate different vocabulary in different circumstances. For example, in order to discuss a country’s industries, it would need to anticipate an entirely different set of words for Belgium than for Bolivia.

A major challenge was programming the system to find this information in the knowledge sources and to parse it appropriately. One problem that I had not anticipated was countries whose names had other meanings (for example, Turkey and Guinea). To solve this problem, I programmed the system to look at the noun senses in WordNet to ensure that it had found the appropriate entry for each country name.

Although the content of this system’s dialogue varies depending on user input, it is still formulaic. In the future, I hope to examine ways to overcome this limitation.^v

___________________________________
ⁱ The consortium consists of the Oregon Graduate Institute’s Center for Spoken Language Understanding (who developed the various toolkit utilities and the speech recognition software), the University of Edinburgh’s Centre for Speech Technology Research (who developed the text -to-speech engine Festival), and UC Santa Cruz’s Perceptual Science Laboratory (who developed Baldi, the anatomically correct animated agent). Information about the toolkit (including how it may be obtained) is available online at time of writing: http://cslu.cse.ogi.edu/toolkit/index.html.
ⁱⁱ WordNet is a large lexical ontology which connects words by various relationships. WordNet was developed at Princeton’s Cognitive Science Laboratory under the direction of George A. Miller. Information on WordNet is available online at time of writing: http://www.cogsci.princeton.edu/~wn/.
ⁱⁱⁱ The CIA World Factbook is a compilation of geographic, demographic, economic, and other data concerning the world’s nations. It was compiled by the US Central Intelligence Agency and is in the public domain. Information about the CIA World Factbook is available online at time of writing: http://www.cia.gov/cia/publications/factbook/.
^iv Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science. URL available at time of writing.
^v Special thanks to Deryle Lonsdale for his mentoring and to the linguistics dept. for providing workspace, equipment, and additional funding for this project.

Brigham Young University

Journal of Undergraduate Research

Task-Centered Conversation Systems for Computer-Assisted Language Learning

Aric Bills and Dr. Deryle Lonsdale, Linguistics and English Language