Morphologically Parsing the Cebuano Lexicon

Jarren Bodily and Dr. Deryle Lonsdale, Linguistics

A morpheme is the smallest grammatical unit of meaning such as a prefix, a root word, or a suffix. Morphology then, is the study of the processes by which morphemes combine to form words.1 Understanding how words are formed is imperative to such applications as speech recognition, web searches, and corpus searches, which depend on identifying every instance of a word, including all inflected forms. For example, a search for the word run must recognize running, runner, and ran as possible results even though they have additional morphemes or irregular conjugated forms. In addition to technical applications, understanding morphology is a necessary decoding skill for both natural speech and reading comprehension.

For the past year, with the help of my faculty mentor, Deryle Lonsdale, I have researched and written morphological rules that describe the formation of words found in the Cebuano lexicon. These rules have been written to function within PC-KIMMO, a finite state two-level processor for morphological analysis. PC-KIMMO was designed as a skeletal program that allows for language-specific files to be written and inserted into the pre-existing morphological processor.The processor operates off of three files, the rule file, the lexical file, and the grammar file.

The rule file accounts for specific morphological and morphophonological processes such as prefixation, infixation, assimilation, and deletion. It is responsible for recognizing how words change as morphemes are added. The lexical file consists of a glossary of root words, prefixes, suffixes, and infixes that are found in the language. It is responsible for recognizing root words and all of the affixes that have been added on to those roots. The grammar file specifies the number and order of allowable affixes that can be added to the root words in the lexical file. It is responsible for guiding proper word formation in regards to the possible combinations of morphemes.

The rule file that I have written for Cebuano has seventeen rules covering a number of morphophonological processes from simple vowel and consonant deletion to infixation and nasal assimilation. These rule files are compiled into finite state tables by PC-KIMMO, some of which are as large as a six to seven dimensional matrices. In addition to the rule file, I have created a lexical file that has around five thousand entries of Cebuano roots and affixes, all of which have been glossed out in English; and a grammar file that allows for a root to take anywhere up to nine affixes in the form of prefixes, infixes, or suffixes. Though these files account for over two thirds of the Cebuano lexicon, rules still need to be written for the processes of consonant vowel reduplication and metathesis.

Cebuano is a Malayo-Polynesian language that is part of the Austronesian language family. These languages are morphosyntactic meaning that the morphology of the language is complex and is highly involved in the grammaticalization of ideas and concepts. An example of this from the work that I have done with PC-KIMMO is the grammaticalization of the concept of humility. Humility in Cebuano is pagkamapainubsanon. This word consists of a root word, ubos, and seven nominal affixes, each of which mold the meaning of ubos, under, into the concept of humility. Within PC-KIMMO, you have the option of forming words from roots and affixes, or breaking inflected words down into their respective roots and affixes. By typing the command generate, and then typing in pag+ka+ma+pa+in+ubos+an+on, PC-KIMMO will generate pagkamapainubsanon thereby naturally applying the vowel deletion rule that turns ubos into ubs when followed by an –an. Additionally, by typing the command recognize, and then typing pagkamapainubsanon, PC-KIMMO will break the word down into all possible morphological parses using the lexical file to check each possible root and affix. For each possible parse, PCKIMMO further glosses the affixes and root with their English counterpart or equivalent. In this particular case, pagkamapainubsanon has twenty-seven possible and valid parses. To further narrow down the number of possible parses for a word, features such as word category or verb type can be added to each lexical entry in the lexical file. This would restrict the possible affixes that can be added to any root word to those whose features matched the features of that particular root word. Adding features to the lexical file is another place where further work is required.

In order to chart my progress as I wrote and tested the morphological rules for Cebuano, I wrote a hierarchy of affixes making sure to work on the most productive first. Once rules were written to govern and constrain one affix, I moved on to the next affix on my list. I also created a list of twenty-five highly inflected words that I had seen in articles, grammar books, and other authentic and translated materials. Each time that I finished a rule, I tested that rule with one of the words from the list. If I was able to recognize the word in PC-KIMMO, then I knew that the rule worked, and further, if I was able to generate the word in PC-KIMMO, then I knew that my grammar file could handle the sequence of morphemes. Through this process of writing and testing rules, I was able to estimate my progress in regards to the scope and coverage of my rules in connection with the whole Cebuano lexicon.

Though there is still more work to do in relation to this project, I have been able to learn and accomplish a lot in the space of one year. It has been great preparation for the masters program that I am starting. The chance to work with faculty mentors on projects of this sort has been an invaluable experience and has given me the desire to take this project as far as I can in the following years. A morphological database of this type will be extremely useful in a number of applications like those mentioned at the beginning of this paper. As a masters student in Linguistics, I hope to continue the work and see its actual application.

References

Finch, G. (2000). Linguistic terms and concepts. New York: St. Martin’s Press

Brigham Young University

Journal of Undergraduate Research

Morphologically Parsing the Cebuano Lexicon

Jarren Bodily and Dr. Deryle Lonsdale, Linguistics

References