Lyndsay A. Staley and John S.K. Kauwe, Biology
Introduction
In genome wide association studies (GWAS), the use of endophenotypes, or intermediate traits, have been found to provide novel insights into understanding the genetics of complex human disease and the pathways and proteins associated with them. Cerebrospinal fluid (CSF) has been shown to contain analytes that may provide promising insight into disease pathways that may not be identified using blood or other biological fluids. The analytes were selected from the Rules Based Medicine, Inc. (RBM) (Austin, TX) Human Discovery Panel 1.0, which includes a range of signaling, structural, and trafficking proteins that have previously shown relationships with human disease pathology. Our research identified a phenotype that is significantly associated with prostate cancer and moderately associated with bone diseases and blood cell diseases. The identification of biomarkers is vital in disease-related genetic understanding as they can provide us with insights into the mechanisms contributing to disease.
Methods
CSF samples from both the Knight ADRC and ADNI series were evaluated for levels of 190 analytes using the Human DiscoveryMAP® panel and a Luminex 100 platform. After filtering each set independently for coverage of the phenotype data (90%) the intersection resulted in the phenotype.
Imputation in Illumina datasets
The 1000 genome data (June 2011 release) and the Beagle software were used to impute up to 6 million SNPs. SNPs with a Beagle R2 of 0.3 or lower, a minor allele frequency (MAF) lower than 0.05, out of Hardy-Weinberg equilibrium (p< 1×10-6), a call rate lower than 95% or a Gprobs score lower than 0.90 were removed. A total of 5,815,690 SNPs passed the QC process.
Statistical Analysis
Kolmogorov-Smirnov goodness-of-fit test was performed in SAS to evaluate normality of the PAP measurements and it did not deviate significantly from normality. We performed a genomewide association for each of the phenotype to identify genetic loci associated with protein levels isolated in CSF. For the initial association analysis in each series we used PLINK to perform linear regression and evaluated the association between the additive model for 5.8M SNPs and each phenotype. Age, gender, and the principle components from Eigensoft were included as covariates.
Meta analysis was performed using default settings in METAL. Genomic inflation factor scores (GIF) were estimated using the R package GenABEL. The initial threshold for significance was P<5x10-8. SNPs that met this threshold were further filtered using the following criteria. First, we rejected markers where the direction of the effect was different in the Knight ADRC and ADNI datasets. Second, we removed all markers where the minor allele frequency was less than 5% (unless they were directly genotyped or had a clear functional annotation). Finally, we also rejected all associations with phenotypes where the genomic inflation factor was greater than 1.03 (GIF was calculated without SNPs where MAF is <0.05). The strict and extremely conservative study-wide alpha level that uses an initial alpha of 5x10-8 and adjusts for an additional 59 phenotypes is 8.47×10-10.
Bioinformatics analyses
We used ANNOVAR to annotate SNPs of interest with location and functional information. We used SIFT and POLYPHEN 2 for preliminary assessments of the functional consequences of amino acid changes.
Results
By using P<5x10^-8 we were left with 228 genome-wide significant markers from the CSF Discovery and 55 of these were also significant in the Plasma data. All significant markers were located near the ACPP gene which encodes PAP. We did not observe any coding changes. Two of the markers, rs3889987 and rs73215971 had scores indicating functional impact in RegulomeDB. Rs3889987 was scored 2a (TF binding + matched TF motif + matched DNase Footprint + DNase peak) and Rs73215971 was scored 3a (TF binding + any motif + DNase peak).
Discussion
Is this signal likely to be real? We believe so. Which snps drive the signal? Rs3889987 and rs73215971 drive the signal. Does this association impact other phenotypes/diseases? We need to continue thoroughly studying this because PAP is known to be involved in blood and bone diseases in addition to cancer.
Conclusion
We have identified a genome-wide significant association for PAP. This association is consistent across two independent samples and two independent biological fluids. It spans a large LD block, which includes the structural gene for PAP. While further work is required to identify the causal variant(s) for this signal, two SNPs, rs3889987 and rs73215971 appear to be meaningful candidates.