Ted Piorczynski and Jamie Jensen, Biology
Introduction
The Lawson Classroom Test of Scientific Reasoning (LCTSR) is a 24-question, multiple-choice test designed to assess students’ scientific reasoning ability. The test consists of 12 scenarios, each of which focuses on testing a specific reasoning pattern. Each scenario is followed by two questions; the first question assesses a student’s ability to apply the specific reasoning pattern being tested, whereas the second asks the student to explain the reasoning behind his or her response to the first question. This dual question approach is designed to verify that students are employing the correct reasoning patterns to reach the correct answers.
The LCTSR has traditionally been graded on a 24-point scale, awarding one point for each correct response. Researchers soon realized, however, that students were receiving one point for answering the first question of a scenario correctly, even though they answered the second question incorrectly, and vice versa. This implied that students were receiving points on their overall test scores even though they were not able to explain the reasoning behind their choices. To counteract this problem, researchers began grading the LCTSR on a 12-point system instead of the traditional 24-point scale. The 12-point system groups two questions from the same scenario as one response; in order to receive one point, a student must answer both questions correctly. The goal of this project is to test if there is a significant difference in student scores between grading the LCTSR by the traditional 24-point scale or the new 12-point method.
Methodology
Over the past year, we have administered the LCTSR to over 800 BYU students and graded the test using both the 12- and 24-point scales. We saw no significant difference between the scores using the two different grading techniques, possibly because the population we tested was too homogenous. We decided to sample a new population with more heterogeneous LCTSR scores to more easily identify (if there is one) a difference in students’ scores using the different grading techniques and accurately rate the effectiveness of each scoring scheme. As a result, the LCTSR was administered to approximately 400 students of various majors at Utah Valley University. The test was taken online on the university’s course management system and the scores were graded using both point scales.
To determine the accuracy of each scoring method, we looked at both the 12- and 24- point scales’ ability to predict ACT scores. Since the ACT is a measure of achievement and achievement is correlated with scientific reasoning skills, the ACT should represent a loose alternative measure of ability. Therefore, we aimed to measure how well the LCTSR predicts ACT scores and see whether the 12-point or 24-point method is a better predictor of this score. All participants in the study gave written consent for researchers to pull their ACT scores from their university’s registrar’s office. After obtaining the ACT scores, we used the SPSS statistical package to run two linear regressions using the 12-point and the 24-point scores each as the predictor variable and ACT scores as the outcome variable.
Results
The 12-point grading method predicts 30.4% of the variance in ACT scores (R = .551, R2 = .304, F = 441.54, p < .001). The 24-point grading method predicts 29.1% of the variance in ACT scores (R = .540, R2 = .291, F = 415.32, p < .001). Each grading scale yields a significant predictor of ACT score, but neither appears to predict the score more accurately. We also ran a reliability analysis of the LCTSR using the 12-point grading method. Results gave a Cronbach’s alpha of .68. The established reliability of the original 24-point scaled LCTSR is .81 (Lawson et al., 2000). Our data indicated a Cronbach’s alpha of .82.
Discussion
Based on our results, it appears that using the 12-point grading method lowers the reliability of the instrument, whereas using the 24-point method yields a more reliable score. This is surprising since the 12-point grading system is designed to reduce the number of points students are awarded for guessing the correct answers.
That being said, there are aspects of our study we need to build upon in order to ensure we are making the correct conclusions. We realize that SAT scores, which measure aptitude, may be a more suitable choice for comparison rather than students’ ACT scores. Although we originally planned on running our analysis using both ACT and SAT scores as measures of the different grading schemes’ effectiveness, we realized that fewer students had taken the SAT than the ACT and that we simply did not have enough SAT scores available to use them in our data. Moving forward, we will obtain more students’ SAT scores so that we can use them as a measure of grading effectiveness.
Conclusion
Our results show that there is a significant difference in students’ scores between grading the LCTSR by the new 12-point method and the traditional 24-point scale; using the 12-point grading method lowers the reliability of the LCTSR, whereas the 24-point grading method yields a more reliable score. Moving forward, we look to obtain a larger, more heterogeneous sample size while using students’ SAT scores as measures of the different grading schemes’ effectiveness.
Lawson, A. E. (2000). The generality of hypothetico-deductive reasoning: Making scientific thinking explicit. American Biology Teacher, 62(7), 482-495.