Flexible Censored Interval Regression with Experimental Applications

Daniel Walton and James McDonald, Economics Department

Introduction

Interval censored data, or grouped data, appears in well-established measurement techniques employed in many economic situations, including experimental economics. Estimation of parameters of models using these data de- pends critically on the model specification and method of estimation. Some methods can yield inconsistent and biased estimators when the distributional assumption of the model is misspecified. Our approach using partially adaptive estimation methods with flexible probability distributions mitigates such issues, including in the presence of heteroskedasticity (nonconstant variance in the explanatory variables) and skewness. In addition, we show that partially adaptive estimation applied in such settings increase the accuracy of estimates significantly, thus making inference stronger.

Welfare analysts often use discount rates when comparing costs and benefits of a certain policy. Harrison et al (2002) test the hypotheses that (i) discount rates are the same across households and (ii) are the same for all time horizons. They conclude that certain sociodemographic differences can account for variation in discount rates, and also that time horizon does have a significant effect on discount rates of individuals. However, their analysis is subject to potential misspecification of distributional assumption, which would weaken and potentially reverse their conclusions (Cook and McDonald 2013). Our project explores more flexible distributional assumptions in order to determine with greater accuracy and certainty which factors affect discount rates. In addition, our work provides greater insight about the time preferences of individuals and how these vary depending on individual characteristics.

Methodology

The proposed model is linear, i.e.,

where only the thresholds of y^*_i are know, or in other words, y^*_i lies in a known interval. We apply a Maximum Likelihood Estimation (MLE) approach to the estimation problem of this model.

As noted in the introduction, the properties of the parameter estimates can be sensitive to the distributional assumptions. The most common implementation of the MLE approach to this type of data in the literature is based on the assumption of normally distributed errors. Adaptive or semiparametric estimation of econometric models avoids specifying a particular probability density function but may be dicult to implement. Partially adaptive estimation relaxes the normality assumption by adopting a more exible probability density function to approximate the actual error distribution. We use the skewed generalized t (SGT) and the generalized beta of the second kind (GB2), each of which allow for a wide range of skewness and kurtosis.

We implemented the estimation procedure for Maximum Likelihood Estimation using MATLAB software. High dimensionality and the nonconvex nature of the objective function required us to develop a robust global optimization method to achieve reliable estimates. We applied a stochastic grid approach to optimization in order to solve this issue. Doing so, we obtained starting values for the algorithm by estimating constrained versions of the model to reduce dimensionality, then generated a stochastic grid of several hundred thousand points about the startvalues, and then running local optimization procedures on the grid points.

One way of realizing restrictions to the model is to use nesting properties of “families” of distributions. The SGT distribution contains as special cases the skewed generalized error distribution (SGED), skewed Laplace (SLaplace), generalized error distribution (GED), skewed normal (SNormal), t, skewed Cauchy (SCauchy), Laplace, Uniform, Normal, and Cauchy distributions. The GB2 distribution contains well-known positive distributions as special cases, such as the generalized gamma (GA), gamma (GG), Weibull (W), lognormal (LN), Burr3, and Burr12. This method was used in the MATLAB program.

Results and Discussion

In this section we consider an application of the previously described methods to the problem of estimating individual discount rates in experimental economics a field experiment described in Harrison, et al., (2002). In this experiment a representative sample of 268 Danish individuals between 19 and 75 years old were invited to answer survey questions with real monetary rewards. To elicit discount rates, individuals were asked in a series of questions whether they would prefer $100 in one month or $100 +x in one +y months, where x >$0 and y = 6, 12, 24, or 36 months depending on the specific condition of the experiment. The exact amount of x was varied in each question. The point at which an individual switches from choosing the current income option to taking the delayed income option therefore provides a bound on his discount rate. Participants were provided with the interest rates associated with the future payment option and knew that they would then be paid for one randomly selected question.

The experiment was designed to test two specific hypotheses. The first hypothesis is that discount rates for a given time horizon do not differ with respect to individuals socio-demographic characteristics. The second hypothesis is that discount rates for a given individual do not differ across time horizons. Harrison and Williams (2002) found that discount rates among this sample of Danish individuals are constant over the one-year to three-year time horizon studied, but varied significantly across several socio-demographic characteristics.

Our results using the SGT and GB2 specifications for the distributional assumption varies the results significantly in some aspects. We find that the hypothesis that the coefficients of time are equal to each other for each time horizon is rejected for the Normal and t distribution. However, this hypothesis cant be rejected for the GED, ST, GT, SGED, or SGT at conventional levels of significance. Second, the hypothesis that the socio-demographic variables are collectively unimportant is rejected for the Normal and t, but not for the GED, ST, GT, and SGT. This case (some members of the SGT family with homoskedasticity) is the only one of the four cases (SGT and GB2 with and without heteroskedasticity) considered where this is true. However, some individual estimates appear significant in one specification, but not another. Hence, the distributional assumptions impact the results of the analysis.

Given the shape of the fitted SGT distributions, we estimate the model using a GB2 specification, which assumes positive valued responses. The hypothesis of constant time effects is rejected, with the discount rate for 6 months being slightly larger than for longer time periods. We also find several socio-demographic factors to be significant, such as wealth, high level of education, owning a home, being retired, and being unemployed.

Conclusion

Many empirical applications in the experimental economics literature involve interval response data. Traditional methods have been widely used in the literature, their properties can be sensitive to distributional assumptions and can yield inconsistent estimators and thus misleading results in cases of distributional misspecification or in the presence of heteroskedasticity. We considered the implications of assuming more exible distributions, which allow for a wide range of data skewness and kurtosis values. We outlined the estimation method based on a flexible distribution which can accommodate a wide variety of distributional characteristics, and hence has the potential to reduce the impact of distributional misspecification.

Relaxing the distributional assumption and allowing responses to exhibit skewness is seen to change the implications of the analysis and impact the results. In particular, our findings suggest that nominal discount rates are significantly affected by the socio-demographic factors, such as income, education, unemployment and retirement status, as well as credit worthiness. We find however that the magnitudes and statistical significance of these coefficients in the model are sensitive to the distributional specification used. Our results are mixed with regard to the effect of the time horizon on the nominal discount rates. Moreover, considering the impact of possible heteroskedasticity on the results, based on the comparison of the log-likelihood values, we reject the assumption of homoskedasticity with the SGT specification, but not with GB2.

Brigham Young University

Journal of Undergraduate Research