Analogical Modeling Of Japanese Loanword Formation From English

Nate Blaylock and Dr. Deryle Lonsdale, Linguistics

The topic of this research, by way of review, was to try to predict, using the Analogical Modeling of Language (AML) method, the formation of Japanese loanwords from English. In other words, given an English word, how would a Japanese person pronounce it if they adopted it into Japanese. (Ice cream being pronounced aisu kurimu, for example). It was intended to be a test of the strengths and weaknesses of AML for this kind of a task (it has already been attempted by others using other methods).

My findings below indicate that AML does quite well at predicting the correct sound patterns (83.59% accuracy). However, the current prediction system was not very robust. When it fails, it fails nefariously, making predictions of sounds that are very (intuitively) far from the correct prediction, predicting pronunciations that are sometimes not even recognizably close to the actual Japanese loanword. However, I believe that further research could work out these problems and also give much better predictions. My research is, thus far, only preliminary, and there are a lot of different settings of the system, and different variables that could be used to give more accurate results.

System Configuration and Variables

AML works (roughly) by taking a lot of data of “patterns” (in this case sequences of sounds) and then using the data of things we already know and making analogies to predict data that we haven’t seen before. I chose as my variables (what the patterns consist of) the sound in question, and then the five sounds before and after it. This way, the system goes through one sound at a time and says, “The sound in English is X, so, the equivalent sound in Japanese in this environment must be Y.” The following is for the technical folks: I ran the gang effect squared and excluded nulls and givens.

The data set consisted of 838 English words (7602 sounds), pronunciations, and their equivalent Japanese pronunciation. I took 105 (963) of these as test cases, and used the rest (733 words, 6639 sounds) as my data set for AML.

Results

The following are the results for the data run on the configuration shown above. AML predicts (usually) several different outcomes with different probabilities. I considered the outcome with the highest probability to be the outcome that was “predicted”. Words are considered “incorrect” if one or more sounds in them was incorrectly predicted. The sounds figure just takes all the sounds lumped together, without regard to which word they were in.

Correctly predicted words: 25/105 (23.81% accuracy)
Correctly predicted sounds: 805/963 (83.59% accuracy)
Wrongly predicted sounds, but correct

prediction was in output set: 139/158 (87.97%)

probability of correct outcome was higher than 10%: 72/139 (51.80%)

higher than 20%: 33/139 (23.74%)

Basically what this means is that, when you only look at individual sounds, AML does very well at predicting which Japanese sound they will correspond to. However, it doesn’t do as well on words, because most words had one or two sounds that weren’t correct in them. I believe that with better data (some of my data was not coded very well) and with better variables, we could achieve even more accurate results.

The lower results were very interesting to me and show, I believe, the promise of AML. As mentioned above, the output of AML is not just one outcome, but a whole set of possible outcomes with associated probabilities that that is the right one. Even though it mispredicted almost 17% of the time, in 87.97% of those, the right answer was in the output set; it just wasn’t the result with the highest probability. In fact, over 50% of those results had a probability of 10% or higher (which is a high probability for AML).

One of the severest problems, as I mentioned above, is that when AML incorrectly predicts a sound, it is usually predicts something that would make the resulting word incomprehensible to a native Japanese speaker. For example, the word aphorism was predicted to be aNarizumu instead of the correct aforizumu. The /fo/ sound (actually 2 different sounds in the system) was predicted to be /Na/. Even when the other 6 sounds were predicted correctly, the result is still incomprehensible.

Conclusion

This research project has been a good experience for me. It has taught me to not to try to accomplish everything at once, but to take research one step at a time. I have as yet not completed the academic paper which I proposed to write with this research, but I do hope to get this research published at an upcoming conference at BYU, in which case I will finish the paper and publish my results there. I feel that these are important results, however. It has shown (at least to me) some of the strengths and weaknesses of AML, which will help me, and others, see how to best use it in our further projects.

Brigham Young University

Journal of Undergraduate Research

Analogical Modeling Of Japanese Loanword Formation From English

Nate Blaylock and Dr. Deryle Lonsdale, Linguistics

System Configuration and Variables

Results

Conclusion