Phenome-Wide Association Study of Y Chromosome Genetic Markers
Faculty Mentor: Mary Davis, Microbiology and Molecular Biology
The purpose of this project was to analyze genetic variants on the Y chromosome for
significant association with various diseases. In Dr. Mary Davis’ lab we gained access to
a unique data set that allows us to analyze many single-nucleotide polymorphisms
(SNPs) on the Y chromosome. This was a hypothesis-generating study that has the
potential to revolutionize the currently accepted hypothesis that genetic variants on the
Y chromosome do not play a significant role in common disease.
The Y chromosome is a special case because it is passed down patrilineally. Recent
studies have shown that certain medical conditions, such as cardiovascular disease, are
influenced by the specific Y chromosome that a man carries. One study conducted in
2012 found a strong association between coronary heart disease and a specific Y
chromosome that is found most commonly in men of Scandinavian origin.
In this study, we were able to look at hundreds of different diseases and their
association with the Y chromosome. We did this through an analysis called a Phenomewide
association study (PheWAS), which uses simple Chi-squared statistics on a large
scale to identify medical conditions extracted from electronic health records that are
associated with specific Y-chromosome variations.
There were three stages to this project. The first stage was a proof of concept. Using a
limited sample size of 677 individuals and 439 Y chromosome SNPs, we aimed to
perform a PheWAS. This analysis was done using the R programming language and a
statistical genetics package actually called PheWAS, which was produced by Vanderbilt
The second part of this study was a concerted effort on behalf of our entire lab. We
were asked to participate in the YGen Consortium, which is a large international
consortium of scientists who are investigating the role of the Y chromosome in
influencing human traits such as BMI, height, waist circumference, birth weight, white
blood cell counts, etc. We gained access to a large data set through Vanderbilt
University that contained 14,176 DNA samples with 69 SNPs for each individual. My
role in this group effort was to parse and prepare the Y chromosome data to be
analyzed by the statistical software that was provided by the consortium.
The third part of this study included our own independent statistical analysis of
The third part of this study included our own independent statistical analysis of the large
YGen sample, and possible association with hundreds of diseases found in electronic
health records. Our goal was to expand the YGen study of a few dozen traits to all
diseases found in a vast electronic health record database. We are still in the early
stages of this analysis, and we aim to collaborate with other institutions to provide
convincing evidence for associations between the Y chromosome and disease.
In the first stage of this study, the proof of concept stage, we were not able to find any
significant results. When testing so many hypotheses simultaneously, we had to be
more stringent in terms of what we accept to be a positive result, and no results passed
this high level of stringency. This was, however, an important first step on our path to
future discovery that prepared us for the subsequent analyses.
In the second stage of our study, the YGen Consortium study, we were able to prepare
and analyze the data so that we could meet the deadlines imposed by the consortium.
Our initial results indicate that a decreased height is strongly associated (p-value =
0.000286) with Y chromosome haplogroup Q1b-S289, which is spread throughout
Europe, Central Asia, and South Asia. A second SNP is associated with an increased
height (p-value = 0.000949) and occurs in Y chromosome haplogroup R1b-S263, which
is found primarily in men of western European origin. Our results will be compared to
the results of several other groups in an attempt to cross-validate the findings. We
anticipate that this will result in strong publication in a high-impact journal.
In the third stage, early results suggested that certain conditions found in the medical
record could be associated with variation on the Y chromosome. The most significant
association (p-value = 2.47 x 10-8) was found between “Iatrogenic pulmonary embolism
and infarction” and the rs34442126 variant, which is associated with haplogroup N, a Y
chromosome that is found most frequently in northern Eurasia. Other conditions that
show strong association with Y chromosome variation include “Pingueculitis”, “Meckel’s
diverticulum”, and “Acromegaly and gigantism”. At this point, these results need to be
further validated through our collaborations.
The results described here represent an early effort to identify associations between the
Y chromosome and disease. As with any new scientific endeavor, the methods used
and described in this study will take time to be developed and validated. The early
results are promising. The association between height and Y chromosome is of special
interest, because it indicates that the Y chromosome may be directly responsible for the
development of sexual dimorphism between males and females, and not only through
the sex-determining region Y (SRY) gene. Our results also seem to support the
association between Y chromosome variation and cardiovascular disease.
In conclusion, my advisor, my lab mates, and I have demonstrated that investigating the
Y chromosome as it relates to various phenotypes and diseases is likely to identify
important genetic associations. Several stages of this study are still in progress, and
they must be reviewed in detail before any strong conclusions can be drawn. We
anticipate that future studies will continue to shed further light on these associations.