Alane Izu and Dr. David G Whiting, Statistics
Introduction
The premedical committee at Brigham Young University exists to assist students in successfully applying to medical school. The committee consists of professors from various departments across campus as well as senior premedical students. Two different committee members interview each eligible student in March and April of the year before the student plans to enter medical school. The interviewers rate the student on a continuous scale from one to five in nine different areas: academic performance, MCAT scores, letters of recommendation, written and personal presentation, personal characteristics, reasons for choosing a medical career, extracurricular activities, involvement in community service, and demonstration of leadership. The nine ratings are averaged to give an overall evaluation score. The committee then writes a letter of recommendation based primarily on this score, but also including other insights gained through the interviews.
The purpose of our research is to investigate the rating process of the premedical committee, in part by answering those questions listed above. By gaining a better understanding of what factors may be influential in ranking premedical students, we hope to improve the interviewing process.
Data
Data was collected and entered into a spreadsheet for the committee ratings for the year 2000. These data contain the rankings for every student that has been interviewed in this time period. In order to maintain confidentiality, all student and interviewer names have been replaced with unique identifying numbers. For each student, we have the nine different categorical ratings from each rater, the average of the nine categorical ratings from each rater, the average of all eighteen ratings from both raters, and the final rating given to the student. We will call the average of the nine ratings from each rater the evaluation average and the average of all eighteen ratings from both raters the overall average. We will refer to the two different raters as Rater1 and Rater2.
Collecting the data was a long process and that permitted advancement of this research. Each student had a piece of paper with all their rankings. Copies of this information were made and given to us to enter into the computer. It was not difficult, but it was a time consuming process that was not planned for.
Analysis
Histograms were made of each of the nine different categories, the evaluation average, the overall average and the final rating for all the raters. This helped determine the range of the ratings given by each rater. Subjects with missing values were omitted.
Scatterplots of the ratings from Rater1 and the ratings from Rater2 were made for each of the nine different categories, the evaluation average, the overall average and the final rating for all the raters were made to assess the similarities in ratings from two different raters on the same student. Correlations of these scatterplots close to one would indicate that the rater does not effect the students ratings. Again, subjects with missing values were omitted.
A scatterplot of the overall average and the final rating was made to examine the difference between the actual average of the students ratings and the rating given to the student. A correlation close to one would mean that the nine different categorical ratings are highly associated with the final rating. All graphs were made using Splus.
A principal component analysis was done using the statistical software in SAS on the eighteen different ratings(nine from each rater). This type of analysis distributes the variance among the eighteen different variables. We did this to see if there were some variables that did not contribute to the student’s final ranking.
Results
There was nothing unusual about the histograms that were made for each rater. The histograms were all approximately normal ranging from around 3 to 5 with occasional outliers. There was not a specific category that raters tended to give lower or higher rankings.
The scatterplots of the ratings from Rater1 and the ratings from Rater2 showed interesting results. There were a lot of outliers and the correlation was not close to one. This means that while some Rater1 gave a ranking of around 3, Rater2 gave a ranking of around 5. The reason for this difference would be a great area for further research.
The scatterplots of the overall average and the final raking were also found to be interesting. While the final ranking was a three, the overall average of most of the students ranged 3.5 to 4. This is another area that should be explored further.
The principal component analysis showed that ten variables contained 90% of the variance. The other 8 variables contained 1% to 2% of the variance.
Conclusions
Although we were not able to accomplish all that we hypothesized, we found some results that opened new doors for research. Why are two raters giving the same student largely different rankings? Why are some student’s overall average largely different from their final ranking? Also this research project was a great experience to me as I was able to see one of the many applications of statistics to society.