1. Recognizing and Transliterating Foreign Names
PhD student: Peter Nabende
Supervisor: Jörg tiedemann, Promotor: John Nerbonne
The project is concerned with the proper handling of entity names through recognition and transliteration. The recognition task mainly involves the identification of word sequences in text that can be considered as proper entity names. In particular we are interested in developing models that can lead to improvements in the identification of pairs of bilingual entity names. Major Natural Language Processing applications that can benefit from bilingual enity name identification include Machine Translation (MT), Cross Language Information Extraction (CLIE) and Cross Language Information Retrieval (CLIR). In the Transliteration task i am interested in models that can lead to improvements in generating transliterations across languages that use different writing sytems so as to help deal with unseen terms or Out Of Vocabulary (OOV) words.
P.Nabende defended hist thesis on Dec. 2, 2011.
2. Computational Morphology for Bantu Language Learning.
Ph.D. student: Fridah Katushemererwe
Supervisor & Promotor: John Nerbonne
A two-level morphology is developed for Runyakitara, a Bantu language spoken by approx. 5 million people in western Uganda. Like most Bantu languages, the morphology is extremely complex. The morphology is then used within a computer-assisted language learning (CALL) system as an information source for exercises on morphology and shown to lead to improved capabilities. A second set of exercises was developed on morphosyntax, i.e. grammatical agreement and word order.
Fridah Katushemererwe is scheduled to defend her thesis on June 25, 2013.
3. TermPedia for Automatic Document Enrichment: Providing Contextually Relevant Information for Technical Terms
PhD student: Procovia Olango
Supervisor: Gosse Bouma; Promotor: John Nerbonne
There is no doubt that technical terms and/or jargon may be a hindrance to document comprehension. This project aims at providing relevant contextual information for technical terms through document enrichment. Document enrichment is a technique that employs natural language processing (NLP) techniques like automatic term recognition, information extraction, and word sense disambiguation for generating links for technical terms to contextually relevant definitions and background knowledge.
In particular, the project will provide relevant contextual information for technical terms in scholarly documents by linking technical terms to their definitions in encyclopedias such as Wikipedia. Both supervised and unsupervised methods for term extraction shall be explored. In the next step, all terms need to be linked to their definitions in an encyclopedia. As terms may be ambiguous it is important to determine the sense of the terms as used in the document, and to provide a link to the contextually relevant definition. The word ontology, for instance, has slightly different meaning in philosophy and computer science. If the system encounters the word ontology in a computer science text, the computer science meaning (conceptualization of a knowledge domain) should be given, and not the philosophical definition (a sub discipline of metaphysics).
The following research questions are therefore expected to be answered at the end of the project:
i. How can technical terms be identified in text and how can they be linked accurately to encyclopedic resources?
ii. Does automatic document enrichment improve understanding of technical documents?
iii. Does automatic document enrichment reduce the time required to acquire knowledge and understanding?
The first question will be investigated using various NLP techniques and various resources, such as Wikipedia and Unified Medical Language System (UMLS). The second and third research question will be investigated in user studies.