Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability

Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, including personalized patient information. However, their application is limited in Africa because they generate text in English, yet indigenous languages are still predominantly spoken throughout th...

Full description

Saved in:
Bibliographic Details
Main Author: Byamugisha, Joan
Other Authors: Keet, Catharina Maria
Format: Thesis
Language:English
Published: Department of Computer Science 2020
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613294990196736
access_status_str Open Access
author Byamugisha, Joan
author2 Keet, Catharina Maria
author_browse Byamugisha, Joan
Keet, Catharina Maria
author_facet Keet, Catharina Maria
Byamugisha, Joan
author_sort Byamugisha, Joan
collection Thesis
description Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, including personalized patient information. However, their application is limited in Africa because they generate text in English, yet indigenous languages are still predominantly spoken throughout the continent, especially in rural areas. The existing healthcare NLG systems cannot be reused for Bantu languages due to the complex grammatical structure, nor can the generated text be used in machine translation systems for Bantu languages because they are computationally under-resourced. This research aimed to verbalize ontologies in agglutinating Bantu languages. We had four research objectives: (1) noun pluralization and verb conjugation in Runyankore; (2) Runyankore verbalization patterns for the selected description logic constructors; (3) combining the pluralization, conjugation, and verbalization components to form a Runyankore grammar engine; and (4) generalizing the Runyankore and isiZulu approaches to ontology verbalization to other agglutinating Bantu languages. We used an approach that combines morphology with syntax and semantics to develop a noun pluralizer for Runyankore, and used Context-Free Grammars (CFGs) for verb conjugation. We developed verbalization algorithms for eight constructors in a description logic. We then combined these components into a grammar engine developed as a Protégé5X plugin. The investigation into generalizability used the bootstrap approach, and investigated bootstrapping for languages in the same language zone (intra-zone bootstrappability) and languages across language zones (inter-zone bootstrappability). We obtained verbalization patterns for Luganda and isiXhosa, in the same zones as Runyankore and isiZulu respectively, and chiShona, Kikuyu, and Kinyarwanda from different zones, and used the bootstrap metric that we developed to identify the most efficient source—target bootstrap pair. By regrouping Meinhof’s noun class system we were able to eliminate non-determinism during computation, and this led to the development of a generic noun pluralizer. We also showed that CFGs can conjugate verbs in the five additional languages. Finally, we proposed the architecture for an API that could be used to generate text in agglutinating Bantu languages. Our research provides a method for surface realization for an under-resourced and grammatically complex family of languages, Bantu languages. We leave the development of a complete NLG system based on the Runyankore grammar engine and of the API as areas for future work.
format Thesis
id oai:open.uct.ac.za:11427/31480
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:33:51.607Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2020
publishDateRange 2020
publishDateSort 2020
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/31480 Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability Byamugisha, Joan Keet, Catharina Maria Brian DeRenzi computer science Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, including personalized patient information. However, their application is limited in Africa because they generate text in English, yet indigenous languages are still predominantly spoken throughout the continent, especially in rural areas. The existing healthcare NLG systems cannot be reused for Bantu languages due to the complex grammatical structure, nor can the generated text be used in machine translation systems for Bantu languages because they are computationally under-resourced. This research aimed to verbalize ontologies in agglutinating Bantu languages. We had four research objectives: (1) noun pluralization and verb conjugation in Runyankore; (2) Runyankore verbalization patterns for the selected description logic constructors; (3) combining the pluralization, conjugation, and verbalization components to form a Runyankore grammar engine; and (4) generalizing the Runyankore and isiZulu approaches to ontology verbalization to other agglutinating Bantu languages. We used an approach that combines morphology with syntax and semantics to develop a noun pluralizer for Runyankore, and used Context-Free Grammars (CFGs) for verb conjugation. We developed verbalization algorithms for eight constructors in a description logic. We then combined these components into a grammar engine developed as a Protégé5X plugin. The investigation into generalizability used the bootstrap approach, and investigated bootstrapping for languages in the same language zone (intra-zone bootstrappability) and languages across language zones (inter-zone bootstrappability). We obtained verbalization patterns for Luganda and isiXhosa, in the same zones as Runyankore and isiZulu respectively, and chiShona, Kikuyu, and Kinyarwanda from different zones, and used the bootstrap metric that we developed to identify the most efficient source—target bootstrap pair. By regrouping Meinhof’s noun class system we were able to eliminate non-determinism during computation, and this led to the development of a generic noun pluralizer. We also showed that CFGs can conjugate verbs in the five additional languages. Finally, we proposed the architecture for an API that could be used to generate text in agglutinating Bantu languages. Our research provides a method for surface realization for an under-resourced and grammatically complex family of languages, Bantu languages. We leave the development of a complete NLG system based on the Runyankore grammar engine and of the API as areas for future work. 2020-03-04T11:37:27Z 2020-03-04T11:37:27Z 2019 2020-03-04T11:33:35Z Doctoral Thesis Doctoral PhD http://hdl.handle.net/11427/31480 eng application/pdf Department of Computer Science Faculty of Science
spellingShingle computer science
Byamugisha, Joan
Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability
thesis_degree_str Doctoral
title Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability
title_full Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability
title_fullStr Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability
title_full_unstemmed Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability
title_short Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability
title_sort ontology verbalization in agglutinating bantu languages a study of runyankore and its generalizability
topic computer science
url http://hdl.handle.net/11427/31480
work_keys_str_mv AT byamugishajoan ontologyverbalizationinagglutinatingbantulanguagesastudyofrunyankoreanditsgeneralizability