Michael Welker, Department of Audiology and Speech Pathology
Introduction
Visemes (phonemes that look the same while viewing the lips) within the English language are extremely difficult for heating impaired individuals using speech-reading to distinguish. Contemporary spectral displays of speech are inadequate to reliably differentiate subtle differences in the acoustic signal. Our project has investigated novel ways to display the spectrum in order to emphasize these subtleties.
Procedure
The project was subdivided into three main phases. The first phase included developing adequate instrumentation. The main problem was writing a computer program which would display the speech data into a trispectral analysis. Eric Hunter, a senior majoring in physics, created a program which initially analyses speech signals by a Fast Fourier Analysis routine. The speech spectrum is then divided into three frequency bands: low, middle, and high. The program then plots the data as a point within an equilateral triangle, with the three axes representing the percentage of energy contained within their respective bands. The amplitudes of the signal are displayed by degree of color saturation. Similarities and differences among the spectra are then shown over time by recording their trajectories. The spatial location of these dots are very sensitive to varia· tions within the speech signal. By displaying the spectrum in this manner, the small differences are exaggerated.
The second phase of the project involved the collection of data to be ana· lyzed. The project was limited to only plosive + vowel combinations produced by a single speaker (male with only General English spoken in the home), due to time constraints. The data was collected in an anechoic chamber to eliminate the adverse effects of reverberation and then stored onto a digital tape. The speech signals to be analyzed came from ten productions of the plosive + vowel combinations in isolation and ten productions of the plosive + vowel combinations produced in continuous speech.
The final phase of the project was the actual analysis of the data. The program developed for this project allows the user to control many of the variables, including the frequency range for each of the bands, the manner in which the data are displayed, and the sensitivity to amplitude. Consequently, the data was constantly displayed and re-displayed with different combinations of variables in order to determine 1) whether the subtle differences could be displayed, and 2) if so, which combination would provide the best differentiation.
Results
Manipulation of the variables did produce unique trajectories from which the plosive + vowel combinations could be distinguished. The frequency ranges appeared to be the most critical factor in creating distinguishable patterns. These patterns may be viewed on any computer monitor capable of running the trispectral program (C+ program· ming) and were displayed during the Office of Research and Creative Works recognition awards on March 28, 1995 at Brigham Young University.
Research Questions/ Problems
While originally gathering the data, it quickly became apparent that analyzing all the plosive + vowel combinations would not be possible in the time frame for the project. Consequently, a decision had to be made concerning which phonemes and vowels to use. The outcome of this deliberation concluded that only the visemes /p/, /b/, and /m/ would be used with the cardinal vowels. If acoustic differences could be reliably differentiated on the new spectral display, then the project could be continued to include more variables.
The second major problem, originally unforeseen at the onset, was how to select the frequency bands to be used. The object was to select bands which would create maximal differences between these and future data. The decision was a hybrid solution. The first was to make the frequency bands a variable to be selected by the user, so that any combination could be used. The other part of the solution was to use a multiplier function which would divide the data into equal bands of intelligibility. A further variable which needed to be considered was the method of analyzing the speech signal for the Fast Fourier Analysis, and over what time duration. The solution was to make these variables which also could be decided on by the research preference of the user. By this method, the computer program will remain flexible for phoneme variations.
Suggested Topics for Continued Research
As with most research projects, more questions were generated than actually answered. However, in the light of this first step, further research in this area is recommended, Specific suggestions will follow for which future investigators may desire to pursue. One suggestion includes a reliability study for both inter:iudge and intra:iudge variables as well as differences produced by a single speaker over time. Another suggestion would include investigating the effects of stress or emotions on the acoustic signal as viewed through this new program. Other areas of investigation which are recommended also include how the acoustic trajectories generalize across age, sex, and multiple speakers. A final recommendation which is certainly necessary is to investigate additional sound combinations.