Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Evaluating automated and hybrid neural disambiguation for African historical named entities

Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may aff...

Full description

Saved in:
Bibliographic Details
Main Author: Dunn, Jarryd
Other Authors: Suleman, Hussein
Format: Thesis
Language:English
Published: Department of Statistical Sciences 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613248119898112
access_status_str Open Access
author Dunn, Jarryd
author2 Suleman, Hussein
author_browse Dunn, Jarryd
Suleman, Hussein
author_facet Suleman, Hussein
Dunn, Jarryd
author_sort Dunn, Jarryd
collection Thesis
description Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system.
format Thesis
id oai:open.uct.ac.za:11427/36921
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:33:07.122Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2023
publishDateRange 2023
publishDateSort 2023
publisher Department of Statistical Sciences
publisherStr Department of Statistical Sciences
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/36921 Evaluating automated and hybrid neural disambiguation for African historical named entities Dunn, Jarryd Suleman, Hussein data science Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system. 2023-02-15T06:44:36Z 2023-02-15T06:44:36Z 2022 2023-02-15T06:43:46Z Master Thesis Masters MSc http://hdl.handle.net/11427/36921 eng application/pdf Department of Statistical Sciences Faculty of Science
spellingShingle data science
Dunn, Jarryd
Evaluating automated and hybrid neural disambiguation for African historical named entities
thesis_degree_str Master's
title Evaluating automated and hybrid neural disambiguation for African historical named entities
title_full Evaluating automated and hybrid neural disambiguation for African historical named entities
title_fullStr Evaluating automated and hybrid neural disambiguation for African historical named entities
title_full_unstemmed Evaluating automated and hybrid neural disambiguation for African historical named entities
title_short Evaluating automated and hybrid neural disambiguation for African historical named entities
title_sort evaluating automated and hybrid neural disambiguation for african historical named entities
topic data science
url http://hdl.handle.net/11427/36921
work_keys_str_mv AT dunnjarryd evaluatingautomatedandhybridneuraldisambiguationforafricanhistoricalnamedentities