Sage Wright and John Kauwe, Biology
Introduction
One of the leading causes of death in Pacific Islanders is diabetes or other metabolic diseases [cdc]. This may be due to the idea that the thrifty genotype hypothesis contributes to the increase in type 2 diabetes [Southam]. The thrifty genotype hypothesis suggests that certain genes that encourage an increase in body fat were beneficial for our ancestors in times of famine. During periods where food was plentiful, these genes would act so that they would deposit fat in order to better survive during famines. It allows individuals to store nutrients that they would need when food was no longer available. However, in our current day society, food is readily available, and these thrifty genes are no longer beneficial, but potentially lead to negative consequences as those times of famine never arrive [Speakman]. Individuals who have these genes gain weight much more rapidly, and since there is no time of famine for this weight to be burned off, it only continues to accumulate, which leads to health Complications, such as diabetes or metabolic diseases. By identifying regions of positive selection in the metabolic pathway, possible solutions may be found in order to alleviate the health issues that accompany many individuals within this population.
Methods
A program, entitled SelecT, created by the Ridge Lab at BYU, identifies regions of positive selection. In order to run this program, genetic data needs to be manipulated into many different data file formats. The data preparation steps are outlined below.
- Clean SNP data using Plink
- Merge WGS (.vcf files) and SNP data using Plink
- Extract PIRs from the .bam and .vcf data files
- Phase the PIRs and merged WGS/SNP data files using SHAPEIT2
- Map ancestral states to SNP identifiers
After these five steps, SelecT can then be run using the phased data and ancestral states.
Results
After running 13 Samoan exomes through SelecT, no regions were identified to be under positive selection within the metabolic pathway; furthermore, no regions were identified at all within the population.
Conclusion
Although no regions were found to be under selective pressure, this may be due to the (a) low sample size, (b) usage of exomes instead of whole genomes, or (c) errors made during the data preparation process. After communicating the with program developers, it was decided that the likely reason behind the lack of results was due to the small sample of exome data instead of improper data preparation. Currently larger groups of Samoan genetic data are being collected and preparation for collection of whole genome information is underway. It is anticipated that with whole genome data and large sample sizes, the research questions posed from this project can be answered within a short amount of time, providing results that may provide great insight into the evolution of our own species.