Ariana Hedges and Dr. W. Evan Johnson, Statistics
At the current state of diagnostic capability, two patients with the same cancer diagnosis may respond very differently to the same treatment. Unfortunately, failure to respond to a particular treatment wastes valuable time and may result in disease progression that may be detrimental for the patient. Therefore, diagnoses finely tuned to individual patients would better guide treatment while eliminating the trial and error that is often part of current medical practice.
One method to improve diagnostic precision involves a better understanding of the biological pathways in an individual patient. Biological pathways are sequences of gene interactions whose aggregate functions control essential cell functions such as metabolism, cell growth, and programmed cell death (apoptosis). Each pathway has its own unique profile or signature defined by the genes that are either activated or deactivated due to pathway interactions. Importantly, any single gene dysfunction in a biological pathway may preclude the proper function of the whole pathway. Thus, examining entire gene pathways would enable a more rigid classification of diseases, resulting in researchers being able to more accurately classify cancer and thus enable the prediction of personalized treatments for individual patients. Specifically, we used breast cancer data and looked at RAS, IGFR, EGFR, and PI3K pathways because of their known involvement in breast cancer.
Using existing breast cancer data from collaborating scientists at the University of Utah and Boston University, we analyzed the gene-expression data using a novel barcoding technique to create a matrix of control and expressed probabilities. From these probabilities, we determined which genes and pathways were being expressed in an individual patient. In other words, we had a matrix T containing the averaged values of gene expression for each pathway when the pathway is expressed and a matrix C containing the average gene expression for the control when the pathway is not expressed. Expanding the current basic Bayesian multiple linear regression, we mathematically derived a Bayesian model to analyze matrices rather than vectors, arriving at the equation
where P is a matrix, not a simple vector, of individuals’ gene-expression, C is the control matrix, S is the matrix containing the difference between the gene expression and the control in a pathway signature, and e is a normal distributed error matrix. Furthermore, α is a matrix of the background strengths, and β is the matrix containing pathway activation strengths. We placed normal and Bernoulli priors on α and β, respectively. To estimate the parameters in the model, we used Gibbs sampling. Then, from the resulting betas and deltas, we may determine from the gene signals what pathways are activated.
However, more work is needed to determine the optimal value for the various hyperpriors on α and β. Because an incorrect value for the hyperpriors can drastically affect the outcome, it is important that we chose neutral values for the priors to avoid biasing the model. As the main variable of interest, we especially need to find the appropriate p for the Bernoulli for β to weight the model towards zero, i.e. making the model more likely to classify a pathway as deactivated. In order to strengthen our model, we still need more information and data on biological interactions so that we can account for the interactive effect of two genes activated simultaneously. As is, our current data appears to be too correlated to be accurately predicted from our model. Currently, collaborators at the University of Utah are performing experiments to obtain this interactive effect.
In addition to strengthening our model, we need more research to apply the drug response data obtained from collaborators at the Lawrence Berkley Laboratory (LBL) to determine which activated gene pathways are the most influential for grouping breast cancer patients into subclasses that respond favorably to specific treatments. Once the molecular subtypes based on pathway status are identified in the cancer samples, then those subgroups can be used to identify relationships in drug-outcome data (from a panel of 60 or so drugs applied to 43 cell lines) obtained from LBL to determine the most effective personalized treatment for the subgroups