Mergers and Acquisitions Document Vector Space Analysis

Steven Fortney and Faculty Mentor: Karl Diether, Finance

Brief Overview of Methodology and Results

This project involved looking at Mergers and Acquisitions and a textual analysis of their SEC filings. We index, clean and match documents in the SEC EDGAR database to the CRSP and Compustat databases in order to consider multiple instruments for “specificity” and uniqueness in a filing. We then scrape each document and use the data to create these instruments. One instrument for specificity we create uses parts of the Stanford Natural Language Processing library and the NLTK and GenSim libraries in python to convert each filing into a vector in a vector space. This allows us to find the cosine distance between any document and the “mean” document/vector in the vector space (a la Hoberg and Hanley, 2010 RFS). We then find the daily split between the returns of firms in the third and first terciles of our instruments for 20 years of data (holding each firm in portfolio for 36 months). We find that the returns from our instrument delivers a strongly significant abnormal return over the Fama-French 3-factor model.

Testing

The hypothesis that we tested is that firms with more concrete or specific plans going into a merger are more likely to be merging for economically sound reasons than others who might be merging for ‘empire building’, etc. We tested this by constructing a number of unique instruments for specificity including a cutting-edge document vector space instrument.

By looking at the SEC filings* that all publicly traded firms firms must submit before they conduct a merger we look at multiple instruments for “specificity” in a filing and hope to find that as firms increase in multiple instruments of specificity, their post-merger returns increase accordingly.

To do this project we first had to match up the EDGAR M&A database (about 3 thousand documents) of SEC filings to the commonly used SDC database on M&A. Each document had to be cleaned, stripped (of HTML) and scanned to get the essential information necessary to do the matching. After matching the two databases we then had definitive dates for when the merger was effected, the market capitalization of the two firms etc.

Having completed the matching, we considered various analogues for specificity in a filing. One of the first and roughest instruments we looked at is that of of raw line length and de-trended line length. When we regressed these two variables on abnormal (relative to the market) post-merger returns we found results in the correct direction (positive correlation) but lacking the statistical significance desired.

We then used the Stanford Natural Language Processing Library to parse each document for instances of proper nouns scaled by the word count of the document in total. A similar test as above again gave us results in the correct direction but lacking statistical significance. Thus we concluded that using naive textual analysis techniques was insufficient as a predictor of post-merger returns.

Finally we converted the whole corpus of documents into a vector space. This is similar to the method employed by Hoberg and Handy (2010, RFS) but we used a TF-IDF transformation on the vector space instead of a straight word count method. For details on how this is done and what a TF-IDF transformation is, the wikipedia page is very illustrative, but the basic ideas is that the significance weights attributed to words is scaled accordingly to their commonness in the corpus. Having our corpus translated into a vector space (i.e. each document could then be thought of as a vector), we then calculated the cosine distance between each document and the mean of all documents. Hoberg and Handy used a similar method to see if IPO prospectuses were more or less “boilerplate” or generic than those of their counterparts. We improve on Hoberg and Handy’s method by incorporating a weighting algorithm called TF-IDF but the method is basically the same to check the specificity of M&A proxy statements.

*These filings primarily contain copies of the letters sent to stockholders before a proxy vote on the merger.

Results

Using our vector space instrument as outlined above, we then found the daily split between the returns of firms in the third and first terciles of our instruments for 20 years of data (holding each firm in portfolio for 36 months). We found that the returns from our instrument deliver a strongly (statistically) significant abnormal return over the Fama-French 3-factor model. One problem with our results that we are trying to address is that the ‘beta’ or regression coefficient on market performance is very low relative to what you might expect for the average stock. In layman’s terms this means that these (merging) stocks we are looking at co-vary very little with the market. Though not a problem in and of itself it is curious enough that it demands further testing. We are currently working on out of sample testing to see if this is a unique result or occurs generally with mergers.

Importance and Conclusion

This project is important because the valuation of mergers is a tricky problem in finance. Thus it is a challenge to find anything that proves to be a good ex-ante predictor of post-merger returns. We are excited that we have a good predictive instrument based solidly in past published literature. Though the results are not perfect, the initial return patterns are encouraging and we are currently working on proving the robustness of our results.

Brigham Young University

Journal of Undergraduate Research