Jeffrey Staples and Dr. Keith Crandall, Biology
Introduction
Geneticists have successfully identified the sources of many diseases that are inherited according to Mendelian rules (e.g. cystic fibrosis and Huntington’s disease). However, not all diseases are caused by one faulty gene; these so-called complex diseases (e.g. asthma, obesity, and diabetes) result from any number of mutated genes, faulty pathways, or environmental conditions. Association studies have identified a particular gene family (Angiopoietin-like proteins or ANGPTL) to be correlated with several of these complex diseases, namely diabetes, obesity, and heart disease1, 2; however, the function and contribution of these genes to these complex diseases is not completely understood.
I believe that analyzing the selection on the ANGPTL gene sequence will reveal new insights on the gene’s contribution to the genetic basis of these diseases. If there are no sites under selection in ANGPTL, then the mutations in the gene should have minimal impact on the diseases. Alternatively, if there are identifiable sites under selection then we can find them. We will use TreeSAAP3 – a bioinformatic tool that uses amino acid properties to be more sensitive to true changes in phenotype – to discover these regions. TreeSAAP was also used to identify the mutation that causes obesity in Pima Indians4. We will use these techniques to study ANGPTL.
Materials and Methods
Data Collection – We received a sequence dataset of over a hundred people for ANGPLT4 from Taylor Maxwell, our collaborator in Houston. We used NCBI to supplement these data with animal sequences of these genes for a deeper phylogenetic tree.
ANGPTL Phylogenetics – We aligned the sequences using MAFFT. Next, we used the program PSODA (written at BYU) to create Phylogenetic trees for each gene by performing Maximum Likelihood heuristic searches and generate parsimony trees. We used Bayesian methods with the program Mr. Bayes v3.04. Models for these tests was determined by Model Test v3.6.
Codon and Substitution Analyses and Identification of Selective Influences – TreeSAAP is an improvement over the traditional method of using dn/ds ratios because it identifies positive selection using amino acid properties making it more sensitive to true changes in phenotype. We used TreeSAAP and PolyPhen to analysis the non-synonymous variants from the dataset. We then ran our alignment on TCS and plotted our results from TreeSAAP and PolyPhen.
Results
Each variant was tested for the effect of the substitution on protein fitness using TreeSAAP and PolyPhen. PolyPhen categorizes a variant’s effect on protein function in three ways: benign, possibly damaging, and probably damaging, with the latter two as significant. TreeSAAP categorizes variants into eight categories, with categories 6-8 as being unambiguously significant3. The results from these two tests are displayed in Figure 1.
Conclusions
Figure 1 shows that TreeSAAP and PolyPhen accurately predicted all of the known harmful mutations. In addition, our results suggest other mutations that are likely to be contributing to disease and should be verified with wet lab experiments. In addition, Taylor Maxwell combined our results with his phenotypic data and confirmed that the combination of PolyPhen and TreeSAAP provide a potentially powerful duo to make predictions based on simple sequence analyses. Our results are included in a paper submitted to the International Journal of Molecular Sciences. We expect this article to be published in the special issue “Cladistic Analysis and Molecular Evolution.”
References
- Willer, et. al. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet. 2008 Feb; 40(2), 129-30.
- Koster, A. et al. Transgenic angiopoietin-like (angptl)4 overexpression and targeted disruption of angptl4 and angptl3: regulation of triglyceride metabolism. Endocrinology 146, 4943–4950 (2005).
- McClellan,D.A. and McCracken,K.G. (2001) Estimating the influence of selection on the variable amino acid sites of the cytochrome b protein functional domains. Mol. Biol. Evol., 18, 917–925.
- Chamala et al., Evolutionary selective pressure on three mitochondrial SNPs is consistent with their influence on metabolic efficiency in Pima Indians. Int J Bioinform Res Appl. 2007;3(4):504-22.