Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Speech recognition of South African English accents

Thesis (MScEng)--Stellenbosch University, 2012.

Saved in:

Bibliographic Details
Main Author:	Kamper, Herman
Other Authors:	Niesler, T. R.
Format:	Thesis
Language:	en_ZA
Published:	Stellenbosch : Stellenbosch University 2012
Subjects:	Speech recognition South African English Multiple accents Accent identification Dissertations > Electronic engineering Theses > Electronic engineering
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867614033710940160
access_status_str	Open Access
author	Kamper, Herman
author2	Niesler, T. R.
author_browse	Kamper, Herman Niesler, T. R.
author_facet	Niesler, T. R. Kamper, Herman
author_sort	Kamper, Herman
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Thesis (MScEng)--Stellenbosch University, 2012.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/20249
institution	Stellenbosch University (South Africa)
language	en_ZA
last_indexed	2026-06-10T12:45:36.533Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2012
publishDateRange	2012
publishDateSort	2012
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/20249 Speech recognition of South African English accents Kamper, Herman Niesler, T. R. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Speech recognition South African English Multiple accents Accent identification Dissertations -- Electronic engineering Theses -- Electronic engineering Thesis (MScEng)--Stellenbosch University, 2012. ENGLISH ABSTRACT: Several accents of English are spoken in South Africa. Automatic speech recognition (ASR) systems should therefore be able to process the di erent accents of South African English (SAE). In South Africa, however, system development is hampered by the limited availability of speech resources. In this thesis we consider di erent acoustic modelling approaches and system con gurations in order to determine which strategies take best advantage of a limited corpus of the ve accents of SAE for the purpose of ASR. Three acoustic modelling approaches are considered: (i) accent-speci c modelling, in which accents are modelled separately; (ii) accent-independent modelling, in which acoustic training data is pooled across accents; and (iii) multi-accent modelling, which allows selective data sharing between accents. For the latter approach, selective sharing is enabled by extending the decision-tree state clustering process normally used to construct tied-state hidden Markov models (HMMs) by allowing accent-based questions. In a rst set of experiments, we investigate phone and word recognition performance achieved by the three modelling approaches in a con guration where the accent of each test utterance is assumed to be known. Each utterance is therefore presented only to the matching model set. We show that, in terms of best recognition performance, the decision of whether to separate or to pool training data depends on the particular accents in question. Multi-accent acoustic modelling, however, allows this decision to be made automatically in a data-driven manner. When modelling the ve accents of SAE, multi-accent models yield a statistically signi cant improvement of 1.25% absolute in word recognition accuracy over accent-speci c and accentindependent models. In a second set of experiments, we consider the practical scenario where the accent of each test utterance is assumed to be unknown. Each utterance is presented simultaneously to a bank of recognisers, one for each accent, running in parallel. In this setup, accent identi cation is performed implicitly during the speech recognition process. A system employing multi-accent acoustic models in this parallel con guration is shown to achieve slightly improved performance relative to the con guration in which the accents are known. This demonstrates that accent identi cation errors made during the parallel recognition process do not a ect recognition performance. Furthermore, the parallel approach is also shown to outperform an accent-independent system obtained by pooling acoustic and language model training data. In a nal set of experiments, we consider the unsupervised reclassi cation of training set accent labels. Accent labels are assigned by human annotators based on a speaker's mother-tongue or ethnicity. These might not be optimal for modelling purposes. By classifying the accent of each utterance in the training set by using rst-pass acoustic models and then retraining the models, reclassi ed acoustic models are obtained. We show that the proposed relabelling procedure does not lead to any improvements and that training on the originally labelled data remains the best approach. AFRIKAANSE OPSOMMING: Verskeie aksente van Engels word in Suid Afrika gepraat. Outomatiese spraakherkenningstelsels moet dus in staat wees om verskillende aksente van Suid Afrikaanse Engels (SAE) te kan hanteer. In Suid Afrika word die ontwikkeling van spraakherkenningstegnologie egter deur die beperkte beskikbaarheid van geannoteerde spraakdata belemmer. In hierdie tesis ondersoek ons verskillende akoestiese modelleringstegnieke en stelselkon gurasies ten einde te bepaal watter strategie e die beste gebruik maak van 'n databasis van die vyf aksente van SAE. Drie akoestiese modelleringstegnieke word ondersoek: (i) aksent-spesi eke modellering, waarin elke aksent apart gemodelleer word; (ii) aksent-onafhanklike modellering, waarin die akoestiese afrigdata van verskillende aksente saamgegooi word; en (iii) multi-aksent modellering, waarin data selektief tussen aksente gedeel word. Vir laasgenoemde word selektiewe deling moontlik gemaak deur die besluitnemingsboom-toestandbondeling-algoritme, wat gebruik word in die afrig van gebinde-toestand verskuilde Markov-modelle, uit te brei deur aksent-gebaseerde vrae toe te laat. In 'n eerste stel eksperimente word die foon- en woordherkenningsakkuraathede van die drie modelleringstegnieke vergelyk in 'n kon gurasie waarin daar aanvaar word dat die aksent van elke toetsspraakdeel bekend is. In hierdie kon gurasie word elke spraakdeel slegs gebied aan die modelstel wat ooreenstem met die aksent van die spraakdeel. In terme van herkenningsakkuraathede, wys ons dat die keuse tussen aksent-spesi eke en aksent-onafhanklike modellering afhanklik is van die spesi eke aksente wat ondersoek word. Multi-aksent akoestiese modellering stel ons egter in staat om hierdie besluit outomaties op 'n data-gedrewe wyse te neem. Vir die modellering van die vyf aksente van SAE lewer multi-aksent modelle 'n statisties beduidende verbetering van 1.25% absoluut in woordherkenningsakkuraatheid op in vergelyking met aksent-spesi eke en aksent-onafhanklike modelle. In 'n tweede stel eksperimente word die praktiese scenario ondersoek waar daar aanvaar word dat die aksent van elke toetsspraakdeel onbekend is. Elke spraakdeel word gelyktydig gebied aan 'n stel herkenners, een vir elke aksent, wat in parallel hardloop. In hierdie opstelling word aksentidenti kasie implisiet uitgevoer. Ons vind dat 'n stelsel wat multi-aksent akoestiese modelle in parallel inspan, e ense verbeterde werkverrigting toon in vergelyking met die opstelling waar die aksent bekend is. Dit dui daarop dat aksentidenti seringsfoute wat gemaak word gedurende herkenning, nie werkverrigting be nvloed nie. Verder wys ons dat die parallelle benadering ook beter werkverrigting toon as 'n aksent-onafhanklike stelsel wat verkry word deur akoestiese en taalmodelleringsafrigdata saam te gooi. In 'n nale stel eksperimente ondersoek ons die ongekontroleerde herklassi kasie van aksenttoekennings van die spraakdele in ons afrigstel. Aksente word gemerk deur menslike transkribeerders op grond van 'n spreker se moedertaal en ras. Hierdie toekennings is nie noodwendig optimaal vir modelleringsdoeleindes nie. Deur die aksent van elke spraakdeel in die afrigstel te klassi seer deur van aanvanklike akoestiese modelle gebruik te maak en dan weer modelle af te rig, word hergeklassi seerde akoestiese modelle verkry. Ons wys dat die voorgestelde herklassi seringsalgoritme nie tot enige verbeterings lei nie en dat dit die beste is om modelle op die oorspronklike data af te rig. 2012-03-09T10:42:05Z 2012-03-30T10:56:48Z 2012-03-09T10:42:05Z 2012-03-30T10:56:48Z 2012-03 Thesis http://hdl.handle.net/10019.1/20249 en_ZA Stellenbosch University 124 p. : ill. application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Speech recognition South African English Multiple accents Accent identification Dissertations -- Electronic engineering Theses -- Electronic engineering Kamper, Herman Speech recognition of South African English accents
title	Speech recognition of South African English accents
title_full	Speech recognition of South African English accents
title_fullStr	Speech recognition of South African English accents
title_full_unstemmed	Speech recognition of South African English accents
title_short	Speech recognition of South African English accents
title_sort	speech recognition of south african english accents
topic	Speech recognition South African English Multiple accents Accent identification Dissertations -- Electronic engineering Theses -- Electronic engineering
url	http://hdl.handle.net/10019.1/20249
work_keys_str_mv	AT kamperherman speechrecognitionofsouthafricanenglishaccents

Full Text Available

Speech recognition of South African English accents

Similar Items