Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may aff...
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Department of Statistical Sciences
2023
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613248119898112 |
|---|---|
| access_status_str | Open Access |
| author | Dunn, Jarryd |
| author2 | Suleman, Hussein |
| author_browse | Dunn, Jarryd Suleman, Hussein |
| author_facet | Suleman, Hussein Dunn, Jarryd |
| author_sort | Dunn, Jarryd |
| collection | Thesis |
| description | Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system. |
| format | Thesis |
| id | oai:open.uct.ac.za:11427/36921 |
| institution | University of Cape Town (South Africa) |
| language | eng |
| last_indexed | 2026-06-10T12:33:07.122Z |
| license_str | Not specified — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository |
| publishDate | 2023 |
| publishDateRange | 2023 |
| publishDateSort | 2023 |
| publisher | Department of Statistical Sciences |
| publisherStr | Department of Statistical Sciences |
| record_format | dspace |
| source_str | UCTD — University of Cape Town Open Access Repository |
| spelling | oai:open.uct.ac.za:11427/36921 Evaluating automated and hybrid neural disambiguation for African historical named entities Dunn, Jarryd Suleman, Hussein data science Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system. 2023-02-15T06:44:36Z 2023-02-15T06:44:36Z 2022 2023-02-15T06:43:46Z Master Thesis Masters MSc http://hdl.handle.net/11427/36921 eng application/pdf Department of Statistical Sciences Faculty of Science |
| spellingShingle | data science Dunn, Jarryd Evaluating automated and hybrid neural disambiguation for African historical named entities |
| thesis_degree_str | Master's |
| title | Evaluating automated and hybrid neural disambiguation for African historical named entities |
| title_full | Evaluating automated and hybrid neural disambiguation for African historical named entities |
| title_fullStr | Evaluating automated and hybrid neural disambiguation for African historical named entities |
| title_full_unstemmed | Evaluating automated and hybrid neural disambiguation for African historical named entities |
| title_short | Evaluating automated and hybrid neural disambiguation for African historical named entities |
| title_sort | evaluating automated and hybrid neural disambiguation for african historical named entities |
| topic | data science |
| url | http://hdl.handle.net/11427/36921 |
| work_keys_str_mv | AT dunnjarryd evaluatingautomatedandhybridneuraldisambiguationforafricanhistoricalnamedentities |