Krista Klingler and Dr. John Kauwe, Department of Biology
Alzheimer’s disease is the most common form of dementia, affecting over 35 million people worldwide. However, the pathology of the disease is not fully understood. Because of this, no cure for the disease currently exists, and once a patient has been diagnosed with Alzheimer’s disease, death usually occurs within ten years.1 This has led to a concentrated effort on understanding the genetics behind Alzheimer’s disease.
The goal for my ORCA project was to study one piece of the Alzheimer’s disease genetics puzzle. I chose to study members of the matrix metalloproteinase (MMP) family, which is a protein group that has been shown to have different levels in patients with Alzheimer’s disease when compared to levels in healthy individuals.2 Therefore, understanding the genetics behind differing levels of MMP proteins could lead to new treatment research sites for the disease. I aimed to determine what genetic variants were seen more in case individuals than normal individuals. To do this, I performed a genome-wide association study for six members of the MMP protein family (MMP 1, 2, 3, 7, 9, and 10). A genome-wide association study is a statistical approach which uses linear regression analysis to determine which genetic variants are more commonly seen in infected patients than in healthy individuals. The reasoning behind this approach is that those variants found only in infected patients are likely to contribute to development of the disease.
Overall, the project went as planned. The data used in this project came from a dataset available in the Kauwe lab which had been obtained through a partnership with Washington University in St. Louis. This consisted of genetic data from about 350 individuals, some case and some control. The statistical analysis was performed using a software package called PLINK.3 Because of the computational power required for an analysis of this type, the analyses were run on the Mary Lou Supercomputer.
After all of the analyses were run, there were significant results in 3 of the MMP proteins, MMP 3, 7, and 10. There were 4 genetic variants in MMP3, 2 in MMP7, and 1 in MMP10 that significantly changed the levels of each respective protein. Each of these significant variants was researched further on the NCBI databases.4 I found that all of them resided in regions that did not actually code for proteins, leading us to conclude that the effect of these variants is more complex than a simple nonsynonymous mutation. However, understanding exactly how these variants lead to differences in protein levels is beyond the scope of the work performed in the lab.
Even with the work put into the experimental design, there were still minor setbacks with the project. One major problem came when I was verifying the initial results from the analyses. Because the data was not normally distributed, I needed to perform additional permutation analysis to verify the original results. However, the results from the permutations appeared in a vastly different order than the association study results. After consulting with my mentor, we realized that our dataset had not been correctly filtered for bad values like we initially believed. The problem was fixed when the correct filters were added into our statistical analysis.
Also, I initially had problems with the covariates I selected. Statistically speaking, covariates are factors that, if not properly accounted for, could alter the results. I originally used the statistical software SAS to find all covariates for each protein and then included these in the analyses. However, many of the covariates were traits that were directly related to Alzheimer’s disease. My mentor helped me to realize that by including these, I was actually removing any tie the protein levels had to Alzheimer’s disease. Once a new covariate set was created, we were able to find significant results.
Once the association studies were done, I decided to expand my analysis of the MMP proteins further. I also perform a set-based analysis with all of the genetic variants found in the regions that code for the MMP proteins. A set-based analysis treats a group of markers as a single unit and determines the overall effect that group has on a certain protein level. I used this type of analysis with each of the six MMP proteins to determine the extent to which all of the variants worked together to effect levels of one protein. This allowed me to gain a general idea of how much the MMP proteins functioned together as a group. I did not find significant evidence in these analyses to conclude that expression levels of an MMP protein is partially controlled by variants in other MMP proteins.
I am currently drafting a manuscript with our completed research for this project. After editing, I anticipate that it can be published in a peer reviewed journal such as PLoSOne. Having the opportunity to work on this project has benefitted me greatly as a bioinformatics major. Previous to taking on this project, I had taken a course instructing us on how these types of analyses were performed and how they can benefit disease research. However, it was not until I undertook a genome-wide association study on my own that I truly began to understand how complex analyses of this type are. Problems that we had talked about in class became real as I encountered them in my research. Without the help of this ORCA grant, I would not have been able to gain this hands-on experience in data analysis work. The skills I learned through this project will prove vital to helping me continue my study of the genetics of human diseases as I start my career.
References
- Querfurht, H. W. and F. M. LaFerla (2010). “Alzheimer’s Disease.” New England Journal of Medicine 362(4): 329-344.
- Horstmann, S., L. Budig, et al. (2010). “Matrix metalloproteinase in peripheral blood and cerebrospinal fluid with low beta-amyloid 1-42 levels.” Neuroscience Letters 466(3): 135-138.
- Available at http://pngu.mgh.harvard.edu/~purcell/plink/.
- Available at http://www.ncbi.nlm.nih.gov/.