James Jensen and Dr. Mark Clement, Department of Computer Science
The project began in the Computational Science Lab in the Computer Science Department, where I was given the assignment of finding something useful to do with a collaborating biologist’s microarray data. Initially, the goal was to identify differentially expressed genes, and shifted to integrating differential expression analysis with other tools. It was at this stage of the project that I submitted the grant application. However, integrating multiple existing tools was eventually abandoned in favor of exploring a new idea.
Many methods exist to infer gene regulatory networks from gene expression data. These networks usually resemble a chaotic hairball of nodes and edges. In this complex form, their rich information is not very accessible. I thought that inferring a simpler unit—the most likely chain of regulatory interactions that connects two genes of interest—may have advantages. A linear path may be surrounded by interactions that are also of interest, but it could capture the most important set of interactions in a comprehensible way. The path could be viewed in its network context by taking its union with other paths. And the most exciting idea was that if each path’s likelihood could be estimated, that value could be used to prioritize paths for laboratory validation, saving researchers considerable time and money.
I hoped to reduce regulatory path inference to a shortest path problem. Inferring the shortest weighted path in a graph is a solved problem; that is, efficient algorithms exist for doing it. But I needed the shortest path to be able to be interpreted as the most likely path. It took months of researching the literature, several discussions with my older brother (a masters student in mathematics), and some trial and error to arrive at a sensible way of framing the problem. I had done an early, flawed implementation of the method in Perl for a class project. Now, as my Honors thesis, I set out to implement it correctly in C++, a much faster and more efficient language, and run it on BYU’s supercomputing cluster. Having begun to program less than six months earlier, this proved an overwhelming task, but with many late nights, and with much understanding from Honors staff, I managed to complete and defend my thesis in August.
My results were respectable but not earth-shattering. The method generally performed about as well as the two alternatives I compared it to. I was not able to test it on as many different datasets as I would have liked, so it was difficult to draw conclusions from the results. And the path ranking concept didn’t work out, for reasons I still don’t entirely understand. Still, my committee found enough merit in the work to encourage me to submit the paper to the BIOT conference to be held in October at BYU. I wrote a more polished version incorporating their feedback, and it was accepted. It was also accepted for publication in BMC Bioinformatics, since one of its issues each year is composed of papers submitted to this conference.
I was uncomfortable with some of the paper’s shortcomings. Wanting to put out my best work and be confident about it, I decided to incorporate some substantial improvements in the months before the final draft was due. Now a more experienced programmer, I made considerable progress in less time. However, I was busy with the first quarter of a PhD program in bioinformatics at UCSD. I realized that I was not going to be able to finish all my changes or do the final analysis myself before the deadline. Dr. Mark Clement, my advisor and coauthor, had played an exclusively advisory role up to this point, but he offered to continue the work where I had left off. I cleaned up my code, bundled it with some explanations, and handed it over to him.
Here are a few of the lessons I learned from the process:
Meeting deadlines. I repeatedly found myself working frantically up to and past deadlines. Dr. Joel Griffitts, who chaired my thesis committee and is the epitome of punctuality, kindly but firmly talked to me about improving in this area. To be sure, I chose this major only a year before graduation and had to squeeze a lot into my final semesters, the supercomputing lab was shut down for maintenance as I was finishing my thesis, and my wife and I had our first child not long before my defense. But rather than trying to do as much as the time will possibly allow, I should make a habit of leaving a buffer so that unforeseen circumstances don’t lead to problems, even if to do so I must be a bit less ambitious in my research goals.
Working with my mentor. In the Computational Science Lab, I was expected to be self-directed and self-motivated. I took that to heart, and it has generally served me well, but in some ways I may have been too independent, especially in that I consulted with my advisor only infrequently. Undoubtedly I could have benefitted from his advice, if not on every particular of the research (some of which was outside his expertise), then at least on many aspects of the research process.
Working with collaborators. At the time that I applied for an ORCA grant, three other students had agreed to work on the project as well, so I included them in the application. However, they chose not to work on the project after all. So I essentially did all the work for the project myself, from the ORCA application itself through to the results and the final report. Had their decision not to work on the project been a formal, explicit one, perhaps we would have contacted ORCA to change the arrangement early on and have me be the sole applicant. But the arrangement was such that the door was always open for them to contribute, and for quite some time they continued to express some intention to do so but never did, being busy with their own classes, internships, and so forth. When we were awarded the grant, I notified them and told them they could pick up their grant money. I am not even sure whether they did or not. This experience was a lesson in communication and collaboration. Tasks for collaborators should be specific and they should be accountable for them. All involved should be realistic about their availability and commitment. And when others are not doing as I had understood they would, I need to contact them and discuss it openly rather than simply forging on with my own share of the work.
Overall, the project was the most enriching educational experience I had at BYU. No class research project had ever given me such freedom to explore my own interests and set my own goals. No teacher had ever pushed me as hard as I ended up pushing myself. The project helped me transition between multiple-choice, fill-in-the-blank learning and the open-ended, self-driven world of academia. I feel strongly that more emphasis and incentive should be given to this kind of learning, and thank and commend ORCA for their support of undergraduate research.