Wayne Sandholtz and Dr. Joseph Price, Department of Economics
The purpose of this project was to examine whether consumers who buy a wider variety of distinct fresh fruits and vegetables pay lower average prices for fresh fruits and vegetables, by substituting away from fruits and vegetables during price shocks. In order to answer this question, I used Nielsen Homescan data, which records all grocery purchases of a nationally representative sample of households. Using regression analysis, I found little direct evidence that households which buy a wider variety of fresh fruits and vegetables pay lower prices for them. In fact, I was surprised to find that in general, higher-variety household seemed to pay higher prices for fresh produce. However, by comparing the Nielsen data to national price index data, I also found evidence that in times of price shocks, high-variety households pay (very slightly) lower prices.
The motivation for this paper comes from studies which find that lower socioeconomic status is consistently associated with lower consumption of fresh fruits and vegetables, as well as with poorer overall health. I was interested in figuring out whether part of this consumption gap could be attributable to differences in the variety of distinct fresh fruits and vegetables households of various income and education levels consume. Other studies have found that in general, price elasticity of demand for fresh fruits and vegetables is low — that is, consumers don’t change their consumption very much in response to price changes. The Nielsen data allowed me to see whether this held true for consumers in all different levels of produce consumption variety.
One of the biggest challenges in this project was also one of its strongest points — the dataset was huge. I included over 166 million transactions from over 8000 households over six years. Having a dataset of this size allowed my estimates to be very statistically precise, but it also presented practical problems. The dataset was too big to analyze on my own computer, so I was required to control one of BYU’s computer clusters remotely. Even using the enormous computing power of this cluster, the analyses took a long time to run. I used a small sample of the dataset while I was writing my code, but when I ran the completed code on the whole dataset, it took more than a day to complete.
Another challenge arose from gaps in the dataset. The Nielsen data are remarkably ample, but there were certain important variables that were nearly impossible to measure, such as the quality of the produce bought. In my analysis, a pound of pristine honeycrisp apples looked just the same as a pound of rotten red delicious seconds. Also, income figures were only approximate–each household was assigned to one of a number of income levels which were cut off at intervals of about $5000, and it would have been nice to have more precise income figures as this was one of the most important control variables in my analysis.
What I Learned
Of course each challenge was also an opportunity to learn. I picked up loads of valuable tricks for dealing with enormous datasets, such that the next time I have to analyze a huge group of data, I should be able to do so much more efficiently. Of course I also honed my coding skills (I did all the analysis using STATA). It was also instructive to review the literature on this subject, and by doing so I came up with some other ideas that could cast light on my research questions from different angles.
In future iterations of the paper, I hope to take geography into greater account. The Nielsen data from 2002 onward include census tract information, and it would be possible to include some kind of variable for the household’s distance from a grocery store, which I believe could help disentangle the effect of preferences from the effect of availability. There is also an “expenditure function” approach pioneered by Jerry Hausman and Ephraim Leibtag which I hope to learn how to adapt to apply to my project.
I presented my preliminary results at a poster session of the National Conference for Undergraduate Research in March of 2012. Building on that experience, this research eventually turned into my honors thesis, which I successfully defended in December 2012. My future plans for this paper are to produce an improved version which incorporates the helpful suggestions from my honors thesis committee, and submit it to an economics journal as a coauthored paper with my advisor Joe Price.