David Morris and Dr. Keith Crandall, Department of Biology
Evolutionary Biology is not a branch of biology that spends much time in the public eye. It just doesn’t have flashy results like cloning or evocative naturalist documentaries to captivate the imagination. However, this doesn’t mean that evolutionary biology is any less essential to our understanding of life and how it may eventually impact people in their daily lives.
A perfect example may be seen in viruses. These small infectious agents evolve quickly and often switch hosts, directly impacting people through such new illnesses as avian and swine flu. This same trend impacts flu shots, as experts must determine which strains are most likely to infect people during the new flu season. When it comes to pathogens, understanding and predicting their evolutionary patterns not only increases the sum of human knowledge but saves lives as well.
Eduardo Castro Nallar, a graduate student working under Keith Crandall, took an interest in polyomavirus. Polyomavirus can cause cancer in immunologically compromised organisms, and is found in host animals as diverse as bandicoots and sea lions to canaries and, yes, people. For individuals with weakened immune systems due to immunosuppressants or AIDS, cancer is the last thing you want to worry about! I was assigned to help Eduardo in analyzing the evolutionary history of polyomavirus.
The evolutionary relationship between a group of organisms is called a phylogeny, and is often represented in a tree format. Evolutionary theory implies that the most genetically similar species are the most closely related, because less changes (or mutations) have occurred since the two species diverged. Normally, the first step of such a project is to obtain the genetic information from the organisms of interest, and then align them all to show where duplications and deletions happened. Once all the sequences are aligned, computer algorithms can be used to determine which species, or viral strains in this case, are related to which. This had already been done by the time I joined the project so we were able to jump directly to the fun stuff: analysis of the data!
The next step is to take that phylogeny, and then compare it to a phylogeny of the host organisms. Yet more computer algorithms work to determine which explanation is the simplest for how the virus may have jumped to so many hosts by assigning a cost to each switching event. Parasite-host cophylogenies are common, but we wanted to try something different with our methodology. We wanted to use a different software package, BEAST, that is often used for phylogeography. In a phylogeography analysis you essentially compare a phylogeny of organisms to geographic locations which are presumed to isolate different populations. Recognizing that host species could be substituted for a location, we wanted to try the same thing. We believed that the BEAST algorithms would be both effective and efficient in creating a parasite-host cophylogeny.
Of course, we could only see if BEAST performed properly by comparing it to accepted cophylogeny methods. Therefore, we picked a number of different “classic” cophylogeny programs against which we could check our results.
Disaster struck just as we’d begun to get a handle on using BEAST. A study had been published on different filtering programs used to “clean” messy genetic data before alignment (Jordan and Goldman). It turned out that GBlocks, the program used by the people who had previously assembled the polyomavirus sequences, was not good at all and wasn’t recommended for use. In one stroke our data was rendered useless.Therefore, I went back to GenBank to retrieve the fresh sequence data, and spent the next several weeks finding all the polyomavirus sequences I could, aligned and filtered them (using T-Coffee this time, a much preferred software choice), and then recreated a phylogeny of the different polyomavirus strains. Since then, we have rerun all of the different analyses we had performed previously, and I am currently in the process of generating results using some more classic phylogeny programs so we can compare BEAST to them. While our analyses are not yet complete, our preliminary results are promising.
Therefore, I went back to GenBank to retrieve the fresh sequence data, and spent the next several weeks finding all the polyomavirus sequences I could, aligned and filtered them (using T-Coffee this time, a much preferred software choice), and then recreated a phylogeny of the different polyomavirus strains. Since then, we have rerun all of the different analyses we had performed previously, and I am currently in the process of generating results using some more classic phylogeny programs so we can compare BEAST to them. While our analyses are not yet complete, our preliminary results are promising.
For the most part, the transitions from host to host match what we have come to expect from other software, which validates use of BEAST for cophylogeny analyses. Our only concern is that BEAST has indicated that polyomavirus originated in humans. However, we believe this may be due to an overrepresentation of human strains in our data, which will will test for.
There have been setbacks, both in data preparation and in utilizing the various software packages which have caused us to delay or initial projections on time of publication. However, the project is nearing completion and work on our paper has begun. We excitedly await the time when we will be able to submit our work for academic publication!
References
- Jordan, Gregory and Goldman, Nick. The Effects of Alignment Error and Alignment Filtering on the Sitewise Detection of Positive Selection. Mol. Biol. Evol. 29(4): 1125-1139. 2012 doi:10.1093/molbev/msr272