Examining Gene-Expression Patterns Across Rare Colon Cancer Syndromes to Identify Early Diagnosis and Treatment Options
Faculty Mentor: Stephen Piccolo, Biology
Colon cancer often runs in families. Individuals in these families have a relatively high lifetime
risk of developing colon cancer and often develop aggressive tumors at a relatively young age.
Currently, genetic testing is the standard way to diagnose hereditary colon cancer; however,
many people who develop hereditary colon tumors do not have a mutation in the tested genes.
Thus a better diagnostic approach, as well as a better understanding of how to target the
underlying molecular causes of hereditary colon cancer, are needed.
Individuals who have a potentially high hereditary risk for colon cancer sometimes do not
receive a genetic diagnosis due to technical limitations or simply because the individuals do not
carry mutations in genes known to cause colon cancer. Our approach bypasses this
inherited-gene testing in favor of identifying active gene-expression patterns for colon cancer
syndromes and checking for matching patterns in individuals who are possibly at risk. These
identified patterns will indicate actual precancerous activity and overall risk regardless of the
inherited reason. We believe this may allow a more reliable and inexpensive diagnosis, and
potential treatment that targets the underlying biological causes of tumor development.
We sought to identify patterns of gene-expression that indicate whether an individual has a low
or high number of polyps. This information could be used as a biomarker to predict who is most
likely to develop a large number of polyps and thus be at a higher risk of colon cancer. It may
also be useful in understanding the biological processes that influence polyp development. We
used gene-expression data from 180 patients for 41,000+ unique human genes and transcripts
(from Agilent whole genome microarrays) to find pathway-based changes affected by mutated
genes in the various syndromes. The data was provided by researchers at the University of Utah
Our initial work involved an analysis of two experiments of particular interest. The first
experiment generated data for individuals with low vs. high polyp number. The individuals all
had the same genetic mutation but some developed hundreds of polyps, whereas other
individuals developed few polyps. The second experiment used patient-matched colonic tissue
samples, including both normal-appearing tissue and tissue with polyps for each patient. This
allowed a comparison of gene-expression activity in colon tissue before and after disease onset.
We selected statistical tools for analysis that would maximize the potential strength of our
results. For our differential gene-expression analysis we used the ComBat 1 and Limma 2 packages
written in the R programming language. These packages contain algorithms that helped us clean
the data and correct for potential biases in the data. We used ComBat to remove batch effects
which would otherwise suggest significance between batches where none should be detected.
Batch effects are measurable variations in the data due to differences in sample handling,
temperature, and any other potential differences that occur between batches. We consider
appropriately minimizing these effects to be crucial in generating more reliable statistical results.
Limma provides tools to normalize this type of data within and between arrays, and to identify
the top differentially expressed genes for the factors of interest. Using these tools, we identified
the most differentially expressed genes between the two groups for each experiment. The
pathways these genes are known to affect are potential areas of interest.
For our gene-set analysis, which we used to identify possible key biological pathways in our
colon cancer samples, we used the Gene Set Omic Analysis (GSOA) method which analyzes
multiple combinations of pathway-level omic data 3 . We used the Hallmark Gene Sets from the
Molecular Signatures Database 4 .
For patients with low polyp vs. high polyp numbers, the most differentially expressed genes,
sorted by log fold-change, were XIST, RPS4Y1, DDX3Y, RPS4Y2, UGT2B17, UGT2B15,
DEFA6, OLFM4, and EIF1AY. Loss of XIST gene expression in adults could lead to the
reactivation of some X genes that are normally silenced after early development. Loss of
RPS4Y1 is linked to reproductive cell formation failure 5 . For patient-matched no-polyp vs. polyp
tissue, the top results were REG1A, REG3A, INSL5, REG1B, TCN1, KIAA1199, MSX2, and
CLDN1. REG1A is often highly expressed in gastrointestinal cancers, and may play a role in
forming tumor-supplying blood vessels 6 .
Of particular interest are the results of our pathway analysis, which focused on more complex,
coordinated biological pathways influenced by many interrelated genes. Both of the statistically
significantly enriched pathways we found are known to play roles in cancer. GSOA found two
significantly enriched pathways in the patient-matched polyp tissue samples. These were the
Hedgehog Signalling (p-value = .01) and Unfolded Protein Response (p-value = .02). Pathways
influenced by the Hedgehog Signalling pathway are typically active during embryo development
but much less afterward. They activate and sustain tumor growth 7 , are important in numerous
aspects of gastrointestinal maintenance 8 , and increase colon cancer cell invasiveness 9 . Pathways
involving the Unfolded Protein Response pathway influence rates of cancer cell death, growth,
and dormancy 10 , and are suspect in intestinal inflammation from mitochondrial stress 11 .
Discussion and Conclusion
Genetic testing remains expensive and limited in its ability to provide conclusive assurances for
colon cancer risk. The possibility to not only provide inexpensive but more meaningful diagnosis
results for families burdened with inherited colon-cancer risk is an exciting development. The
potential for the application of these methods to other inherited diseases, especially cancers,
merits further discovery using this type of gene-expression analysis. Identified patterns of
differentially expressed genes may help provide more economical means to diagnose rare colon
cancer syndromes without the need for full genetic testing nor the identification of inherited
culpable genes. The altered biological pathways tied to the Hedgehog Signalling and Unfolded
Protein Response pathways suggest potential for targeted treatment options that may be explored
in future endeavors. This developing method has potential application for other cancers with
known inherited risk, including inherited cancers yet lacking identified genes of impact.
1. Leek, J., Johnson, W., Parker, H., Jaffe, A. & Storey, J. The sva package for removing batch
effects and other unwanted variation in high-throughput experiments. Bioinformatics 28,
2. Kerr, M. Linear Models for Microarray Data Analysis: Hidden Similarities and Differences.
Journal of Computational Biology 10, 891-901 (2003).
3. MacNeil, S., Johnson, W., Li, D., Piccolo, S. & Bild, A. Inferring pathway dysregulation in
cancers from multiple types of omic data. Genome Medicine 7, (2015).
4. GSEA | MSigDB. Software.broadinstitute.org (2016). at
5. U.S. National Library of Medicine. Genes. Genetics Home Reference (2016). at
6. Hara, K. et al. Effect of REG Iα protein on angiogenesis in gastric cancer tissues. Oncology
Reports (2015). doi:10.3892/or.2015.3878
7. Barakat, M., Humke, E. & Scott, M. Learning from Jekyll to control Hyde: Hedgehog
signaling in development and cancer. Trends in Molecular Medicine 16, 337-348 (2010).
8. van den Brink, G. Hedgehog Signaling in Development and Homeostasis of the
Gastrointestinal Tract. Physiological Reviews 87, 1343-1375 (2007).
9. Wang, T., Hsu, S., Feng, H. & Huang, R. Folate deprivation enhances invasiveness of human
colon cancer cells mediated by activation of sonic hedgehog signaling through promoter
hypomethylation and cross action with transcription nuclear factor-kappa B pathway.
Carcinogenesis 33, 1158-1168 (2012).
10. Ma, Y. & Hendershot, L. The role of the unfolded protein response in tumour development:
friend or foe?. Nature Reviews Cancer 4, 966-977 (2004).
11. Rath, E. et al. Induction of dsRNA-activated protein kinase links mitochondrial unfolded
protein response to the pathogenesis of intestinal inflammation. Gut 61, 1269-1278 (2011).