Dr. William Barrett
During the past year, I have had the opportunity to mentor two undergraduate students as we performed research for improving technologies used for family history. The specific projects each student worked on, the outcomes of the projects, and the mentoring are described below.
Handwriting Recognition Dataset Creation
We are working on handwriting recognition research that may help reduce the amount of manual indexing required to make genealogical records (such as census images) searchable. We have census images and corresponding information from FamilySearch Indexing. However, the names in the images are not split into parts (surnames, given names, etc.) like they are in the indexed information. We will have to develop methods of automatically splitting the names into parts. To test how well our methods work, we need to compare our results to a dataset that we know has been correctly split into name parts.
The undergraduate student marked the locations of the name parts in a large collection of census images and modified / created code to automatically adjust the geometric coordinates of the name parts in the images after they images were aligned to a census template. As a result, we will be able to make simple modifications and use the dataset to test how well our methods are working. In addition to this direct outcome, the undergraduate student gained experience writing and modifying C++ code while performing the task, and also gained experience programmatically performing 2-D transformations on geometric points. I helped her understand the existing code, outlined what changes and additions she would need to make, and helped her figure out what to do when she ran into difficulties.
Discovering Historical Social Networks in Document Transcriptions
The other undergraduate student helped me with research related to automatically discovering historical social networks (connections between people historical people) by analyzing travel group rosters and transcriptions of portions of diaries relating to Mormons traveling across the oceans and plains. He did a significant amount of programming using C++ and PHP, both modifying existing code and writing additional code. Since he had very little experience with either language, this project was beneficial in helping him learn them both. He also found software to do data visualization and created figures to use in a paper.
We published this research in an international workshop (Historical Document Imaging and Processing) held in Beijing, China. I involved him in the writing process of parts of the paper that he was directly responsible for, and he was an author on the paper. In addition to the experience he gained doing research, learning new programming languages and tools, he has also now been published in the proceedings of the workshop (archived in the ACM Digital Library), which will benefit him as he applies to graduate schools in the future.