Mono-Associated Gnotobiotic Animal Meta-Genome Wide Association R Package

Penrod, Corinne
MAGNAMWAR

Mono-Associated Gnotobiotic Animal Meta-Genome Wide Association R Package

Faculty Mentor: John Chaston, Plant & Wildlife Science

Introduction

Animal-associated bacteria fundamentally influence the normal function and behavior of their
hosts, and understanding the genetic basis for these effects is an ongoing challenge. The
traditional approach to understanding phenotypic effects of genes by mutant analysis is to
perform a genetic screen. MAGNAMWAR is an R package that provides a simple and effective
gene function predicting pipeline to be used as a faster alternative to a genetic screen. It uses
meta-genome wide association (MGWA) and bacteria genotype-host phenotype relationships to
predict bacterial gene function. Proven to be an effective gene predictor in work done with
Drosophila melanogaster, this pipeline is functional for any dataset with a mono-associated
organism. Currently there are no R packages that provide a simplified MGWA process to
researchers. As open source software, MAGNAMWAR will allow researchers to streamline gene
functionality predictions for use in future experiments.

Methodology

The conceptual basis for the package is to use bacterial genetic content and host phenotype data
to perform a MGWA. Slightly different from a traditional genome wide association study
(GWAS), an MGWA study analyzes the phenotype of a host and the genotype of a bacteria
within that host, whereas a GWAS analyzes the phenotype and genotype both collected from a
host to determine significant effects. Therefore, MAGNAMWAR requires phenotypic data
recorded from a set of mono-associated hosts, each associated with a different taxa of an
organism. It also requires the genetic content of the host-associated diverse taxa to be clustered
using orthoMCL clustering software to produce clustered orthologous groups (COGs) of
genetically similar proteins. The COG data and the phenotypic measurements provide all the
information needed to run the MGWA study.

Results

Using the methods previously mentioned, we developed a simplified R package to streamline the
process of using an MGWA on a mono-associated organism. This package provided a road map
for novice programmers on how to successfully run an effective MGWA. We designed the
pipeline to be easy to use and we also provided many different forms of documentation, such as
an R vignette. Vignettes in R are long form documentation to help a user understand the
workflow of the software.

Discussion

The software described above is currently being beta tested by researchers who are familiar with
our research pipeline in order to prepare the software for publishing on the Comprehensive R
Archive Network (CRAN). Once it is thoroughly beta tested and available on CRAN, it will be
freely available for download.

Conclusion

Using the techniques of clustering genetic content and MGWA, we provide an easy and effective
way to predict candidate genes for mutant analysis. Although MAGNAMWAR was originally
developed for Drosophila melanogaster as the host organism to explore the microbiome, this
pipeline can be used on any mono-associated host organism or even on bacterial genotype/
phenotype relationships such as biofilm formation on plastic. This package is designed to be
simple for all researchers to use in many different fields. Because of the customizability of the
statistical tests within MAGNAMWAR, appropriate data sets can vary widely. The goal of this R
package is to have accurate predictions and to be simple to use. We presented our research as the
ASM Conference on Beneficial Microbes in 2016 and at the BIOT Conference in December of
2015. We also hope to publish an application note in Oxford Journal of Bioinformatics and to
publish the software for open source on CRAN.

Brigham Young University

Journal of Undergraduate Research

Mono-Associated Gnotobiotic Animal Meta-Genome Wide Association R Package