John D. Crawford and Dr. Deryle Lonsdale, Linguistics
The purpose of this project was to produce a phrase-structure grammar for Biblical Hebrew and to determine the best way to apply said grammar in a computerized parser in order to test its durability. Overall I believe this project was a success. Although the parser has only been tested within the first few chapters of Genesis, there is sufficient variety of syntax there to test the parser’s capabilities. In addition, the problems of developing a parser have made the more difficult aspects of the application of linguistic theory to real data much more real and interesting for me.
After some discussion with my mentor, I chose to use the Government-Binding (GB) theory of linguistic usage as the foundation of my grammar. The chief reason for this decision was my familiarity with the theory and the nature of the parsing program available. I had used a parsing program developed by James Allen that relies on GB theory before and was experienced in writing the kind of grammars needed for it to function properly. However, adapting a Hebrew grammar to our parsing program proved to be more difficult than I had expected.
Hebrew typically has verb-subject-object (VSO) word order and this was harder to describe within the GB theory than I had initially realized. This sort of thing was not easily drawn with tree diagrams. I found that I had to turn to more traditional descriptors within the grammar in order to explain the syntactic functions of the words within the sentence, as I could not adequately draw their structure with the parser I was using. Therefore, I had to write Phrase- Structure (PS) rules that looked like this:
(Clause) > ((Head (Verb Phrase)) (Subject) (Object)).
When I had to begin changing my rules in order to make the grammar match well with the parser, I began to understand the difficulty of the problems of generic theory versus specific application. I could take a particular theory (like GB) and use it to explain the phenomena within Hebrew, but I found that it failed to account for all observed data (for instance, it didn’t handle VSO order very well). Or I could build a purely descriptive grammar that would handle the observed data very well, but would do little to describe the deeper processes that are obscured by the surface structure of a particular sentence. I tried to balance both of these concerns, sticking to the basic overall ideas of the GB theory as described in Andrew Radford’s Syntax: a minimalist introduction (1997) while producing a grammar that would work within the confines of my parser. It wasn’t always easy.
For instance, according to Radford, determiners (like the) are the heads of most noun phrases in English, because they determine if the phrase is definite and they can provide grammatical agreement information.1 Initially one could make the same generalization in Biblical Hebrew. However, when one studies the data that is produced by construct phrases in Hebrew, Radford’s minimalist theory (extrapolated from GB theory) begins to weaken. A construct phrase in Hebrew consists of two or more nouns, with the first having a special “construct” form. The relationship between the two nouns is roughly equivalent to the use of the English preposition of to describe possession. The initial noun “belongs” to the following noun, as in ben hamelek or “the son (ben) of the (ha-) king (melek)”. In construct phrases, the information about definiteness (the choice between English a/an and the) is contained in the final element of the phrase. However, the agreement information that allows the phrase to interact with other elements of a clause is contained in the initial element of the phrase. As a result, it is hard to state which half of the phrase governs the other half, since information which is essential to the sentence as whole is carried in both elements of a construct phrase. So, in order to account for this data, I had to discard some of the GB-theory’s basic ideas in specific contexts.
I also ran into the problem of overproduction with my grammar. For instance, Biblical Hebrew has a “verbless” clause that accepts a number of different types of predicates (nominal, adjectival, adverbial, and infinitival). As may be expected, these clauses are handled differently than clauses of the finite-verb variety. However, these verbless clauses do tend to be handled in the same basic manner. In order to handle these predicates I developed several new rules and a new category specifically for such predicates. But as a result many more things could now be defined as a predicate. The parser overgenerated some of these forms. This is not a problem with the parser, as it also generated the correct forms and only accepted the proper distribution of grammatical constituents in the end. But the other forms generated sometimes made it confusing to read the output of the parser and took up processing time. It is hoped that with some further development of the parser, new constraints and filters may be added to the grammar that would cut down on this kind of background noise, as it were.
In conclusion, I was able to create a program that accurately parsed data in Biblical Hebrew, which was my original goal. For instance, it was smart enough to see that while Genesis 1:1 can be considered one grammatical whole, verse two is actually three separate clauses stuck together. Each clause contributes to the overall themes of the verse, but each is clearly a thought separate from the others. There is no reason to assume that they are one grammatical unit, when neither context, content, nor grammar make such an assumption inevitable.
I would like to thank the ORCA office and Dr. Lonsdale for the help they gave me on this project. Working on the parser was fascinating and I enjoyed it thoroughly. It helped me to get a better understanding the field of Semitic philology and how I can use computational linguistics within that field. I intend continue work on the parser as it isn’t quite descriptive enough yet and I intend to apply what I learn to other Semitic languages in the future.