Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, including personalized patient information. However, their application is limited in Africa because they generate text in English, yet indigenous languages are still predominantly spoken throughout th...
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Department of Computer Science
2020
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613294990196736 |
|---|---|
| access_status_str | Open Access |
| author | Byamugisha, Joan |
| author2 | Keet, Catharina Maria |
| author_browse | Byamugisha, Joan Keet, Catharina Maria |
| author_facet | Keet, Catharina Maria Byamugisha, Joan |
| author_sort | Byamugisha, Joan |
| collection | Thesis |
| description | Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, including personalized patient information. However, their application is limited in Africa because they generate text in English, yet indigenous languages are still predominantly spoken throughout the continent, especially in rural areas. The existing healthcare NLG systems cannot be reused for Bantu languages due to the complex grammatical structure, nor can the generated text be used in machine translation systems for Bantu languages because they are computationally under-resourced. This research aimed to verbalize ontologies in agglutinating Bantu languages. We had four research objectives: (1) noun pluralization and verb conjugation in Runyankore; (2) Runyankore verbalization patterns for the selected description logic constructors; (3) combining the pluralization, conjugation, and verbalization components to form a Runyankore grammar engine; and (4) generalizing the Runyankore and isiZulu approaches to ontology verbalization to other agglutinating Bantu languages. We used an approach that combines morphology with syntax and semantics to develop a noun pluralizer for Runyankore, and used Context-Free Grammars (CFGs) for verb conjugation. We developed verbalization algorithms for eight constructors in a description logic. We then combined these components into a grammar engine developed as a Protégé5X plugin. The investigation into generalizability used the bootstrap approach, and investigated bootstrapping for languages in the same language zone (intra-zone bootstrappability) and languages across language zones (inter-zone bootstrappability). We obtained verbalization patterns for Luganda and isiXhosa, in the same zones as Runyankore and isiZulu respectively, and chiShona, Kikuyu, and Kinyarwanda from different zones, and used the bootstrap metric that we developed to identify the most efficient source—target bootstrap pair. By regrouping Meinhof’s noun class system we were able to eliminate non-determinism during computation, and this led to the development of a generic noun pluralizer. We also showed that CFGs can conjugate verbs in the five additional languages. Finally, we proposed the architecture for an API that could be used to generate text in agglutinating Bantu languages. Our research provides a method for surface realization for an under-resourced and grammatically complex family of languages, Bantu languages. We leave the development of a complete NLG system based on the Runyankore grammar engine and of the API as areas for future work. |
| format | Thesis |
| id | oai:open.uct.ac.za:11427/31480 |
| institution | University of Cape Town (South Africa) |
| language | eng |
| last_indexed | 2026-06-10T12:33:51.607Z |
| license_str | Not specified — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository |
| publishDate | 2020 |
| publishDateRange | 2020 |
| publishDateSort | 2020 |
| publisher | Department of Computer Science |
| publisherStr | Department of Computer Science |
| record_format | dspace |
| source_str | UCTD — University of Cape Town Open Access Repository |
| spelling | oai:open.uct.ac.za:11427/31480 Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability Byamugisha, Joan Keet, Catharina Maria Brian DeRenzi computer science Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, including personalized patient information. However, their application is limited in Africa because they generate text in English, yet indigenous languages are still predominantly spoken throughout the continent, especially in rural areas. The existing healthcare NLG systems cannot be reused for Bantu languages due to the complex grammatical structure, nor can the generated text be used in machine translation systems for Bantu languages because they are computationally under-resourced. This research aimed to verbalize ontologies in agglutinating Bantu languages. We had four research objectives: (1) noun pluralization and verb conjugation in Runyankore; (2) Runyankore verbalization patterns for the selected description logic constructors; (3) combining the pluralization, conjugation, and verbalization components to form a Runyankore grammar engine; and (4) generalizing the Runyankore and isiZulu approaches to ontology verbalization to other agglutinating Bantu languages. We used an approach that combines morphology with syntax and semantics to develop a noun pluralizer for Runyankore, and used Context-Free Grammars (CFGs) for verb conjugation. We developed verbalization algorithms for eight constructors in a description logic. We then combined these components into a grammar engine developed as a Protégé5X plugin. The investigation into generalizability used the bootstrap approach, and investigated bootstrapping for languages in the same language zone (intra-zone bootstrappability) and languages across language zones (inter-zone bootstrappability). We obtained verbalization patterns for Luganda and isiXhosa, in the same zones as Runyankore and isiZulu respectively, and chiShona, Kikuyu, and Kinyarwanda from different zones, and used the bootstrap metric that we developed to identify the most efficient source—target bootstrap pair. By regrouping Meinhof’s noun class system we were able to eliminate non-determinism during computation, and this led to the development of a generic noun pluralizer. We also showed that CFGs can conjugate verbs in the five additional languages. Finally, we proposed the architecture for an API that could be used to generate text in agglutinating Bantu languages. Our research provides a method for surface realization for an under-resourced and grammatically complex family of languages, Bantu languages. We leave the development of a complete NLG system based on the Runyankore grammar engine and of the API as areas for future work. 2020-03-04T11:37:27Z 2020-03-04T11:37:27Z 2019 2020-03-04T11:33:35Z Doctoral Thesis Doctoral PhD http://hdl.handle.net/11427/31480 eng application/pdf Department of Computer Science Faculty of Science |
| spellingShingle | computer science Byamugisha, Joan Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability |
| thesis_degree_str | Doctoral |
| title | Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability |
| title_full | Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability |
| title_fullStr | Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability |
| title_full_unstemmed | Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability |
| title_short | Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability |
| title_sort | ontology verbalization in agglutinating bantu languages a study of runyankore and its generalizability |
| topic | computer science |
| url | http://hdl.handle.net/11427/31480 |
| work_keys_str_mv | AT byamugishajoan ontologyverbalizationinagglutinatingbantulanguagesastudyofrunyankoreanditsgeneralizability |