Christopher R. Layton, Melvin J. Carr, Associate Director of ORCA
Each year the Office of Research and Creative Activities awards $1,000 scholarships to students who propose research and creative projects. These students represent each of the colleges at BYU. The idea is to compare the projects and research proposals with each other to select those which are most deserving of funding. This is difficult to accomplish for several reasons. The main problem is that we are trying to compare apples to oranges. The research projects from the engineering department and the creative projects from the music department are not easy to compare. Another problem is the one of getting a representative score for each proposal. Why are these problems?
Comparing projects from two different disciplines is difficult because it requires us to have some criteria to evaluate that is common to both disciplines, or all disciplines in our case if we want to make this a university wide competition. This not only involves how we evaluate the proposals, but who we get to evaluate them. The evaluation procedure needs to be one that can be applied to all of the colleges. Over the past years, trends have been noticed in the different departments. Some tend to score higher than others. If we use raw scores, the departments that tend to score higher have the advantage.
All of this leads us to believe that we need to find a procedure that will allow us to estimate the real score of a proposal as compared with other projects from the same discipline and other disciplines. We also need to make sure that the evaluation procedure we choose fairly ranks the projects. To ensure that we choose the best proposals, these two goals are equally important. If our evaluation method is good, but we are unable to estimate the correct score of the proposals, then we cannot be sure we have the best proposals.
The current method of distribution of the ORCA awards is done in two rounds. The first round takes place within each of the colleges. Each college has the proposals ranked by their faculty. The proposals are given a score and the top 25% of each college is funded. This ensures that the awards will be equally distributed between colleges (at least 25% will be equally distributed). This completes round one.
Round two takes the next 15% (after the top 25% who received awards) from each college and has them reevaluated by a different, but similar college. The college of engineering may have its second round proposals evaluated by the college of physical sciences. After a new score has been given to these proposals, they are compared with one another. Because of this comparison, the competition is university wide. These proposals are then ranked from best to worst based on these new raw scores. ORCA continues to fund the highest ranking projects until they no longer have funds.
The major problems with this method are those which have been discussed above. The raw scores are hard to compare from college to college. How can this method be improved? One of the easiest ways to make the scores comparable is to teach each of the evaluating professors how to score the proposals. Although all of them receive the same instruction packet, it is easy to misinterpret these instructions. Some of the professors have never evaluated an ORCA proposal so they do not have past experience on which to base their evaluation.
The mathematical solution to this incompatibility of raw scores is called standardization. This method assumes that each professor evaluates the proposals in the same way (few high scores, few low scores, many scores in between), but using a different scale. This can be compared with grading on a curve. Once we assume that each of the evaluating professors grades this way, we can use the mathematical procedure of standardizing the scores so that they are all on the same scale. It is most important to standardize in the second round when all colleges are being compared with one another.
Another possible method for evaluation is using a computer program called LARC (Larsen-Allen Ranking by Computer). This program can be used to order proposals by comparing them to one another. The basic procedure for using LARC is the same as the current method. A reviewer needs to evaluate a number of proposals. Instead of scoring each proposal, the reviewer simply ranks them from best to worst. After a proposal has been reviewed by several judges, LARC simply needs to know how many and which proposals it was reviewed against, how many of these proposals were better, and how many were worse. Although it is impossible to compare each proposal against every other proposal, LARC is able to take this into account and rank the proposals accordingly.
LARC can be used for both the first and second round evaluations. After LARC is run, it prints out a score for each proposal. This score indicates how well the proposal compares to the other proposals. The higher the score the better. Unlike standardizing, these scores can not be compared from one running of LARC to the next. If the one department uses LARC to evaluate its proposals, it can not use the scores it obtains to compare its proposals with another department who uses LARC. To compare the proposals of two or more departments, the rankings of the proposals must all be run at the same time. LARC is not able to compare those proposals which were evaluated in separate runs of the LARC program.
We have had the opportunity to compare LARC and standardizing on small simulated groups, but never on any real data. This next year (Fall 1998), the ORCA scholarship proposals will be compared using both LARC and the standardizing method. We will be able to test how easy each one is to use and how different the results are. This will allow us to judge which method best suits the needs of the ORCA evaluation committee.