Patrick Turley and Dr. James B. McDonald, Department of Economics
Data truncation is the source of econometric problems in many economic datasets. Truncation occurs when all observations below or above a certain threshold are systematically removed or are unavailable. For example, campaign contributions below a certain level are not usually publicly available, so any contribution below that level would not be included in the dataset. When such data is used in standard ordinary least squares analysis, the results of the analysis are biased and inconsistent. Many methods have been developed to try to correct for truncation bias with varying degrees of success. In my research, I examined a new estimation method of Quasi-maximum Likelihood using flexible distributions and compared it to a few other existing methods. In past research, similar methods have proven to be more accurate and efficient than many other standard models
The Tobit model was the first to address the concern of truncated data (Amemiya 1973). It did so with maximum likelihood estimation assuming a truncated normal error distribution. Unfortunately, the error in many economic datasets is not normally distributed, leading to bias and inconsistency in Tobit results. In this research, we relaxed the normality assumption and instead assume that the error belongs to a family of distributions, optimizing the likelihood function over the parameters of the model and the distributional parameters. If the distribution of the error is contained in the family of distributions imposed on the model, then this estimator will be unbiased and consistent. Other methods have also been developed that drop the distributional assumption altogether using semiparametric methods or kernel estimators (Cosslett 2004, Lewbel and Linton 2002, Lee 1998). In addition to examining the merits of quasi-maximum likelihood methods in truncated models, we will compared these methods to several existing ones.
In conducting this research, I analyzed these various distributions and methods by using Monte Carlo simulations and through analysis of empirical data. For the Monte Carlo simulations, I wrote programs that create data with known characteristics and test the accuracy of estimation methods of the tested methods by comparing the known parameters with the estimated ones. In this analysis, Quasi-maximum Likelihood Estimators tended to outperform the nonparametric estimators examined. The one exception was the Cosslett estimator, which performed comparably.
For the empirical analysis, I used data from the current population survey to compare the annual returns to education for those in the tenth to fiftieth percentile relative to those in the fiftieth to ninetieth percentile. Quasi-maximum Likelihood methods generated the result that lower income individuals receive returns to their annual salary of around 18% per year of study compared to about 10% for higher income individuals. The Cosslett estimator, however, produced the result that the lower income individuals receive only around 1% relative to around 2% for high income individuals. Thus, the result of this empirical example are ambiguous both in the magnitude of the effect and the size relative to high or low income individuals. Several problems have since been identified with the analysis, and those are being corrected currently.
This project has given me many opportunities to do real academic research. For this project I have been involved in every step of the research process. I came up with this research idea last year when I wrote a very simple version of this paper for one of my courses. I’ve had the chance to conduct background research to find out what has already been done in the this area. I wrote programs in the computing language Matlab that could execute the methods were research and comparing to run both on my desktop and the supercomputer. I gathered data to analyze with the above mentioned programs. Finally, I wrote that paper that we are preparing for publication. It’s been great to see a project from start to finish.
The main outcome of this research has been my honors thesis. Dr. McDonald and I have also submitted our research to the Joint Statistical Meetings and we will be presenting it in August. I also hope to I coauthor with Dr. McDonald and submit to an academic journal such as the Journal of the American Statistical Association or the Journal of Business and Economic Statistics.
Sources
- Amemiya, T. (1973), “Regression Analysis when the Dependent Variable is Truncated Normal,” Econometrica 41, 997-1016.
- Cosslett, S.R. (2004) “Efficient Semiparametric Estimation of Censored and Truncated Regressions Via a Smoothed Self-consistency Equation.” Econometrica 72 (4), 1277-1293.
- Lee, M.-J., and Kin, H. (1998) “Semiparametric Econometric Estimators for a Truncated Regression Model: a Review with an Extension,” Statistica Neerlandica 52(2), 200-225.
- Lewbel, A., and Linton, O. (2002) “Nonparametric Censored and Truncated Regression.” Econometrica 70 (2), 765-779