Mark Wadsworth and John S.K. Kauwe, Biology
Introduction:
Alzheimer’s Disease (AD) is a debilitating disease that has increased the hardship of many lives. Many people above the age of 65 have been diagnosed with this genetically complex disease, which financially and emotionally burdens those diagnosed and their families. Earlier this year a study was released that found, using a dataset of individuals that have had strokes, that individuals with AB blood type are more at risk of developing cognitive impairment[1]. This study however was not looking at the genetics of AD. My objective in performing this study was to see if there is a genetic association between Alzheimer’s disease risk and age of onset of AD, and blood type. In the end we could not verify the results of the previously mentioned study.
Methods:
Dr. Kauwe has a dataset of 37,000 individuals provided by the Alzheimer’s Disease Genetics Consortium (ADGC). I began sitting down with Dr. Kauwe and figuring out how to design the experiment based on the available data. After figuring out which variants to look for and verifying that they were in the dataset, he advised to start by phasing the data, which means to figure out on which chromosome each variant is, using a program named ShapeIt. I needed to do this in order to be able to tell which blood type individuals had because blood types differ based on which variants are on the same chromosome. I then had to write a program in python to take in the ShapeIt results and label the blood type of each individual, given the set of Single Nucleotide Variants (SNVs). Then I created a file that had the case/control status of each individual. I then began to run several association studies to see if there was a correlation between case/control status and blood type. Then I ran a power test to see if our results were of sufficient power to report.
Results:
After a couple months of designing and programming, I finally figured out how to define the blood type of the individual based solely on the phased genetic data. I then ran the data for all of the individuals through the pipeline and extracted the necessary data. The results of the association studies were unimpressive. My hypothesis was that AB blood type would show up protective against the development of AD. I found no association for or against protection from AD except for B blood type. This coincidental result, that B blood type was protective, will be further investigated in the future. After the association studies, I ran a power study to check to see if we had the statistical power to conclude that we could not substantiate their findings that AB blood type protects from AD. I found that we would need quite a few more genomes than we had available in order to make that conclusion with my study design.
Conclusion:
Unfortunately, we could not substantiate or negate the original findings. However, this was a very formative experience for me. I learned a variety of things about how to design, execute, and substatiate population genetics studies. From Dr. Kauwe, I learned what makes a good hypothesis, and how to design one. Working with him, I was able to develop my questions, and focus them on one thing, so that I avoid performing too many tests on the data, which lessens the statistical power. I also had to learn a lot about genetics and biology in order to write the program to analyze the phased data and accurately predict blood type. The execution of testing the hypothesis is very important to the outcome of the association studies. I had to decide which variables to use as covariates in the statistical regression. To make that decision, I had to learn about each of the measurements that were collected with the data. I also learned a lot from Dr. Kauwe on the strategic choosing of covariates. I think the single most important take-away that I have from this project is looking at the aspect of statistical power. Without the power the results cannot be reported because they may or may not be correct. Dr. Kauwe helped me understand how to work and interpret the statistics behind this and why it is necessary. In conclusion, I was disappointed that I was not able to substantiate my hypothesis, but I was able to learn some very important things that have helped me in my personal development as a scientist. The lessons have helped me grow and succeed as I have applied them in my internship, and in my classes. I am very grateful to the ORCA program for giving me this opportunity to learn more about science and to push me out of my comfort zone so that I could learn and grow in so many ways.