David W. Embley, Computer Science
Stephen W. Liddle, Information Systems
Deryle W. Lonsdale, Linguistics and English Language
Yuri Tijerino, Information Technology
On December 29, 2009 we were informed that our application for a 2010 MEG grant was approved. This final report sketches the accomplishments attained during the project’s timeframe since then.
The project’s proposed academic objectives were as follows:
- Recruit, bring together, and mentor students from computer science, linguistics, and e-business.
- Enhance our prototype extraction system by developing a multilingual mapping capability.
- Develop a business plan for exploring related tech-entrepreneurship possibilities.
- Build strong peer relationships in a research and development lab-based environment.
- Establish international networking relationships by hosting international faculty and students.
These objectives were achieved in various ways, as summarized point-wise below.
- As indicated in the proposal, our work was interdisciplinary and cross-organizational. Over a dozen BYU students participated on a paid, course-credit, or volunteer basis. A list of their names, affiliations, and contributions is attached as an appendix to this report. They represented two colleges (CPMS and CoH) and the Marriott School, and several departments (Linguistics & English Language, Computer Science, Information Systems). Each contributed insights and skills from their disciplines, and the interaction was enlightening for all involved. Mentoring in each area was possible since the supervising faculty represented each of the three major disciplines and affiliations.
- We succeeded in developing an extended prototype of our multilingual mapping system. This involved designing and coding the functionality and requisite knowledge sources. We also described this aspect of our work in several peer-reviewed publications, most of them also presented at international or national venues. A list of the relevant publications is included as an attachment to this document; the full papers and associated presentations are available at our website: http://www.deg.byu.edu. The work is being further pursued beyond the end of this MEG project’s timeframe.
- A group of undergraduate students from the Marriott School joined our group as a part of this project. Their role was to investigate a possible business model for our research. The result was an honor’s thesis plus a viable plan based in part on our work.
- Our work on this project established and profited from strong peer relationships. Our weekly lab meeting involved interaction with all three faculty members, Yuri Tijerino—our visiting professor on sabbatical from Japan (August 2010 – March 2011), our visiting scholar from Korea, and the undergraduate and graduate students. Our lab also hosted (along with the Computer Science department) several colloquium speakers from other institutions who are widely recognized in our field:
- 2010: Lee Giles from Penn State and Daniel Lopresti from Lehigh University
- 2011: Jeff Pinkston from Microsoft and Michael Cafarella from the University of Michigan
- 2012: George Nagy from Rensselaer Polytechnic Institute
We are also engaging with several local researchers—many from the Church—regarding possible cooperation and future directions. They include Jake Gehring of Family History; Dennis Meldrum and Pat Schoen of Family Search; Church service missionaries who worked with a prototype data annotation system; a group from the Church History Library (March 2010); and Lee Gibbons of LDS.org. Additionally we competed in an international web information extraction programming competition (Web People Search 2010). This in addition to our engagement with international scholars at the several venues where we gave presentations about our work.
- Our project brought together students and faculty from the U.S., Canada, Japan, Korea, PR China, and Nepal. Four students from Asia (two Chinese, one Japanese, and one Nepalese) plus their advisor (Tijerino) visited BYU to work in our lab and collaborate with us during a period of about two weeks. We also hosted an international conference call with potential collaborators at King Fahd University of Petroleum and Minerals in Saudi Arabia. For over a year now we have hosted Byung-Joo Shin, a postdoctoral scholar from Korea whose expertise and contributions to the project include database engineering, publication authorship, and the annotation of Korean knowledge sources.
Mentoring environment evaluation
Our environment was cross-disciplinary, involving work in linguistics, computer science, and business applications. It afforded our students a wide-ranging exposure to several threads of research and several experts across the spectrum of our activities, most of which transcended their classroom experience. Many of our students found employment in related areas, often through the networking connections they have established while working on this project. Professional and personal mentoring took place in our weekly meetings and in regular one-on-one appointments between faculty and students.
Budget expenditures description
The budget was mostly spent on salaries for the CS students listed in the appendix for their programming work on the developed prototype. Funds budgeted for travel were used to partially offset travel for project-related presentations at conferences: WWW 2010 (Lonsdale, USA), ER 2011 (Liddle, Belgium), HIP 2011 (Embley, PR China), and ER 2012 (Embley, Italy). We also purchased a mobile device for application development as detailed in the proposal.
Scholarly and academic publications based on multilingual extraction ontologies (2010-2012)
- David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Joseph S. Park, Byung-Joo Shin, and Andrew Zitzelberger (2012). Cross-Language Hybrid Keyword and Semantic Search. Proceedings of the 31st International Conference on Conceptual Modeling (ER 2012), Florence, Italy, pp. 190-203.
- Deryle W. Lonsdale, David W. Embley, Stephen W. Liddle, and Joseph Park (2012). Extracting information from French obituaries. Proceedings of RootsTech 2012 Family History Technology Workshop (4 pages).
- David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale and Yuri Tijerino (2011). Multilingual Ontologies for Cross-Language Information Extraction and Semantic Search. Lecture Notes in Computer Science, 2011, Volume 6998, Conceptual Modeling – ER 2011, pages 147-160.
- Chad Turner (2011). The Start-up Process: A Case Study in Applying Entrepreneurial Principles Taught in the Marriott School to Ontology-Based Data Extraction, BYU Honors Thesis.
- David W. Embley, Stephen W. Liddle and Deryle W. Lonsdale (2011). Principled Pragmatism: A Guide to the Adaptation of Ideas from Philosophical Disciplines to Conceptual Modeling. Lecture Notes in Computer Science, 2011, Volume 6999, Advances in Conceptual Modeling. Recent Developments and New Directions, Pages 183-192.
- D.W. Embley, S.W. Liddle, D.W. Lonsdale, S. Machado, T. Packer, J. Park, N. Tate, and A. Zitzelberger (2011). Enabling Search for Facts and Implied Facts in Historical Documents. Proceedings of the International Workshop on Historical Document Imaging and Processing (HIP 2011), Beijing, China, pp. 59-66.
- Andrew Zitzelberger (2011) HyKSS: Hybrid Keyword and Semantic Search, BYU Master’s Thesis.
- David W. Embley, Stephen W. Liddle, and Deryle W. Lonsdale (2011). Conceptual Modeling Foundations for a Web of Knowledge. In: (D.W. Embley & B. Thalheim, Eds.) Handbook of Conceptual Modeling: Theory, Practice, and Research Challenges, pp. 477-516, Springer Verlag, ISBN 978-3-642-15864-3.
- Charla Woodbury (2010). Automatic Extraction from and Reasoning about Genealogical Records: A Prototype, BYU Master’s Thesis.
- David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Aaron Stewart, and Cui Tao. (2010). KBB: A Knowledge-Bundle Builder for Research Studies; In: (J. Trujillo, G. Dobbie, H. Kangassalo, S. Hartmann, M. Kirchberg, M. Rossi, I. Reinhartz-Berger, E. Zimányi, & F. Frasincar, Eds) Advances in Conceptual Modeling—Applications and Challenges; Proceedings of the ER2010 Workshops ACM-L, CMLSA, CMS, DE@ER, FP-UML, SeCoGIS, WISM; Lecture Notes in Computer Science 6413, Springer-Verlag. Berlin; pp. 148-159; ISBN 978-3-642-16384-5.
- Deryle W. Lonsdale, David W. Embley, and Stephen W. Liddle (2010). Ontologies for Multilingual Extraction. Proceedings of the World Wide Web (WWW) Conference’s 1st International Workshop on the Multilingual Semantic Web (MSW 2010), P. Buitelaar, P. Cimiano, and E. Montiel-Ponsoda (Eds.), CEUR Workshop Proceedings, Vol. 571, pp. 1-4, ISSN 1613-0073.
Students who participated, affiliations, and academic deliverables
- Thomas Packer (CS PhD): table processing, grammar acquisition, paper co-author
- Andrew Zitzelberger (CS MS): search/query, Japanese knowledge sources, MS thesis, paper co-author
- Aaron Stewart (CS MS): OCR and named entity extraction, paper co-author
- Joseph Park (CS undergrad, MS): tools design, implementation, system evaluation, paper co-author
- Josh Monson (CS undergrad): annotator
- Aaron Leonard (CS undergrad): annotator
- Nathan Tate (CS undergrad): datalog-like rule evaluator
- Spencer Machado (CS undergrad): programming support, mostly interfaces
- David Boudreau (CS undergrad) annotator, exporter
- Tae Woo Kim (BYU-Hawaii undergrad, BYU CS MA): Korean annotation
- Becca Brinck (Linguistics undergrad): French annotation
- Charla Woodbury (CS MS): first-cut extraction and reasoning tool, MS thesis
- Chad Turner (MarrS, undergrad): business model research, honors thesis
- Derrick Baldwin (MarrS, undergrad): business model research
- Rhett Ferrin (MarrS, undergrad): business model research
- Benjamin Turner (MarrS, undergrad): business model research
- Jordan Calder (MarrS, undergrad): business model research
Note: CS is Computer Science and MarrS is the Marriott School.