Sherstin Merx and Dr. Scott Grimshaw, Statistics
Since 1982, Brigham Young University Departments of Statistics and Political Science have sponsored the Utah Colleges Exit Poll (UCEP) as both a research tool and as an experience in applied statistics. However, in designing the poll, UCEP routinely deals with administrative and geographic issues that make the task more difficult. For example, just collecting the necessary information from all 29 Utah counties requires the full attention of a designated “Public Relations” committee. Geography is another problem for UCEP. Certain counties are so remote that it is difficult and expensive to include them in the sample.
These problems occur in every election, and they raise the question of whether certain counties really need to be included in the exit poll. But by intentionally limiting our sample, we introduce bias into sample results. The purpose of this study was to determine what implication that bias has on UCEP results.
To determine how much influence certain counties have on exit poll results, we must first determine the expected value of the estimator. If the expected value equals the true population value, then it is unbiased. If, however, it does not, then we must determine if that bias is significant.
A Horvitz-Thompson estimator is used in the exit poll because the complex sample design is reflected in the inclusion probabilities. That is, to estimate the proportion of Republican votes in a statewide race,
The Horvitz-Thompson estimator is unbiased if all counties are included in the sample since
Because these formulas are only defined when the probability of inclusion ( i p ) is greater than zero, we are essentially redefining the boundaries of the state of Utah when we exclude counties. The result is that the Horvitz-Thompson estimator is now biased. To calculate this bias we take the difference between the actual candidate percentage and the proportion estimate after omitting counties.
More importantly, we must also of consider the variance of the estimator. In the exit poll, it is the estimator variance that determines whether or not an election may be called. The larger the variance, the greater the difference between candidates must be in order for a winner to be declared. The variance of the Horvitz-Thompson estimator is:
To apply these ideas to the exit poll, we first needed to determine which counties to eliminate from the sample. To do this we classified the counties into certainty and non-certainty counties. A “certainty” county is one in which a participating school is located. This county is certain to be included in the sample because we know we will have volunteers in that area to man selected polling places. A “non-certainty” county on the other hand is one which is not guaranteed to be included in the sample. Instead these counties are grouped into strata and a few of the counties within each stratum are selected for the sample. This division in counties is reflected in the probability of inclusion, p i , of each county: p i = 1 for certainty counties, and p i < 1 for noncertainty counties.
Table 1 compares the bias and variance of the estimator for the sampling plan used in a given year and an alternate when all non-certainty counties were eliminated (i.e. the probability of inclusion was set to zero). The ratio of variances is the relative significance of eliminating counties. Asymptotically we expect the ratio to be near one; the difference between the variances should be small enough that in the long run they do not significantly alter the results. But as the results in Table 1 show, in some of the races the variance ratio is far from one. This means that we will be underestimating our variance, and possibly calling elections that are in reality too close to call.
Because were interested in eliminating counties from the exit poll, this study only dealt with sampling at the county level. But in a real exit poll, there is also variation due to polling places selection and variation due to the individual voters. For future study, we would include that variation, and possibly conclude more definitely what practical significance eliminating counties might have on exit poll results.