Kevin Boehme and Dr. John “Keoni” Kauwe, Biology
Introduction
Alzheimer’s disease (AD) is a common and complex neurodegenerative disease. It is the most common cause of dementia and is characterized by the accumulation of amyloid plaques and neurofibrillary tangles. To date, many genetic loci have been found that modify AD risk, but collectively, they explain only a fraction of the heritability of the disease. It is hypothesized that rare variants with large effects as well as epistatic interactions account for much of the unexplained heritability in AD.
A recent study by Ebbert et al. found evidence of two gene-gene interactions among three known AD genes that increase AD risk: CLU-MS4A4E and CD33-MS4A4E. Specifically, Ebbert et al. reported interactions between rs11136000 C/C (CLU) and rs670139 G/G (MS4A4E) genotypes (synergy factor = 3.81; p = .016), and the rs3865444 C/C (CD33) and rs670139 G/G (MS4A4E) genotypes (synergy factor = 5.31; p = .003). All three genes are on the “AlzGene Top Results” list, which summarizes the most established genes associated with AD to date.
In this study, we attempted to replicate these gene-gene interactions by performing an independent meta-analysis of datasets from the Alzheimer’s Disease Genetics Consortium (ADGC), followed by a combined meta-analysis including the original Cache County data. The main CLU-MS4A4E interaction replicates in both the independent and combined meta-analysis while the CD33-MS4A4E interaction failed to replicate.
Methodology
Data Collection
Data for this project comes from the Alzheimer’s Disease Genetics Consortium (ADGC). This consortium consists of 30 studies and contains over 20,000 samples with AD case/control status, SNP data, and APOE genotypes. Since gene-gene interactions are challenging to identify and replicate, we used only the highest quality data possible. For each ADGC dataset we removed SNPS with low quality scores (info < 0.5) and SNPs with a high proportion of missing values (<95% genotyping rate). We then filtered samples with less than 90% genotyping rate across all SNPs ensuring a well genotyped sample pool. We extracted the three SNPs of interest: rs3865444 (CD33), rs670139 (MS4A4E), and rs11136000 (CLU) and tested Hardy-Weinberg equilibrium. Using the R statistical package, we excluded all samples that did not have complete data for all covariates including age, gender, cohort, case-control status, APOE ε4 dose, and the two SNPs being tested in the corresponding interaction. Any datasets missing the respective SNPs or covariates after data cleaning were excluded from further analysis.
Statistical Analysis
Following data preparation, we tested the individual interactions in each dataset using logistic regression. We performed logistic regressions in R using the covariates previously mentioned. We defined the R models as “case_control ~ rs3865444*rs670139 + apoe4dose + age + sex” and “case_control ~ rs11136000*rs670139 + apoe4dose + age + sex” for the CD33-MS4A4E and CLU-MS4A4E interactions, respectively, which include the main and interaction effects in the models. All analyses in this study used each gene’s homozygous minor allele as the reference group.
Using results from each study, we performed a meta-analysis to test significance across the ADGC datasets using METAL, and performed a second meta-analysis including the Cache County results.
Our results are represented as synergy factors and their associated 95% confidence intervals and p-values. Synergy factors measure whether the effect size of two interacting genetic variants is greater than the sum. Similar to odds ratios, synergy factors less than one and greater than one suggest decreased and increased risk in case-control studies, respectively.
Results
The originally reported CLU-MS4A4E interaction between the rs11136000 C/C (CLU) and rs670139 G/G (MS4A4E) genotypes replicates in both the independent (synergy factor = 2.37, p = 0.007) and combined (synergy factor = 2.71, p = 0.0004) meta-analyses, with minor evidence for a dosage effect in the combined meta-analysis (synergy factor = 1.73, p = 0.02).
The CD33-MS4A4E interaction failed to replicate in either the independent (synergy factor = 1.16, p = 0.78) or combined (synergy factor = 1.63, p = 0.24) meta-analyses.
Discussion
The CD33-MS4A4E interaction failed to replicate, and may have resulted from over-fitting in the Cache County data. However, we successfully replicated the CLU-MS4A4E interaction. Our results provide evidence for an interaction between these known AD SNPs that modulate risk above and beyond what each SNP contributes individually.
The biological explanation for this interaction is difficult to decipher. Little is known about MS4A4E. It is part of a membrane-spanning family and its role as a trans-membrane protein may facilitate aggregation. Membrane-spanning proteins play diverse roles in cell activity including transport and signaling. Research suggests CLU prevents amyloid fibrils and other protein aggregation events. Further in vivo and in vitro experiments must be done in order to characterize the biological mechanism by which these genes influence AD.
Since all analyses in this study used each gene’s homozygous minor allele as the reference group, the interactions between major alleles are framed as a risk factor, meaning the interaction between the minor alleles is protective. The minor allele for CLU is protective as is being APOE ε4 negative, while the minor allele for MS4A4E increases risk. The interaction between CLU and MS4A4E from the minor allele perspective is protective.
Conclusions
In this report we successfully replicated interactions between CLU and MS4A4E, in a large, independent dataset. We show that, in terms of minor alleles, this interaction provides a protective effect for AD. Interactions between CD33 and MS4A4E failed to replicate.