Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Predicting household poverty with machine learning methods: the case of Malawi

Poverty alleviation continues to be paramount for developing countries. This necessitates the need for poverty tracking tools to monitor progress towards this goal and effect timely interventions. One major way poverty has been tracked in Malawi is by carrying out integrated household surveys every...

Full description

Saved in:

Bibliographic Details
Main Author:	Chinyama, Francis
Other Authors:	Berman, Sonia
Format:	Thesis
Language:	English
Published:	Department of Computer Science 2022
Subjects:	computer science
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613145970769920
access_status_str	Open Access
author	Chinyama, Francis
author2	Berman, Sonia
author_browse	Berman, Sonia Chinyama, Francis
author_facet	Berman, Sonia Chinyama, Francis
author_sort	Chinyama, Francis
collection	Thesis
description	Poverty alleviation continues to be paramount for developing countries. This necessitates the need for poverty tracking tools to monitor progress towards this goal and effect timely interventions. One major way poverty has been tracked in Malawi is by carrying out integrated household surveys every five years to quantify poverty at local and national levels. However, such surveys have been documented as very expensive, tedious, and sparsely administered by many low- and middle-income countries. Therefore, this study looked at whether machinelearning models can be used on existing survey data to predict poor and non-poor households, and whether these models can predict poverty using a smaller number of features than those collected in integrated household surveys. This was achieved by comparing the performance of three off-the-shelf, open-source machinelearning classification algorithms namely Logistic Regression, Extra Gradient Boosting Machine and Light Gradient Boosting Machine, in correctly predicting poor and non-poor households from Malawi survey data. These supervised learning algorithms were trained using 10-fold cross-validation. The experiments were carried out on the full panel of features which represent all the questions asked in a household survey, as well as on smaller feature subsets. The Filter method and SHapley Additive exPlanations method were used to rank the importance of the features, and smaller data subsets were selected based on these rankings. The highest prediction accuracy achieved for the full panel data set of 486 features was 87%. When the Filter method rankings were used, the models' prediction accuracy dropped to 63% for the top 50 features subset. However, using the SHAP method rankings, the maximum prediction accuracy level was maintained and only dropped slightly to 86% with the top 50 feature subset; to 84% with the top 20 features; and 73% for the top 10 features. Area under the Curve, Receiver Operating Characteristic curve, recall, precision, F1 score, Matthews Correlation Coefficient and Cohen's Kappa scores confirmed the classification models' reliability. The study, therefore, established that poverty can be predicted by open-source machine learning algorithms using a substantially reduced number of features with accuracy comparable to using the full feature set. A policy recommendation is to employ only the top explanatory features in surveys. This will enable shorter, lower-cost surveys that can be administered more frequently. The aim is to assist policymakers and aid organisations to make more timely interventions with better targeting of the poorest.
format	Thesis
id	oai:open.uct.ac.za:11427/36423
institution	University of Cape Town (South Africa)
language	eng
last_indexed	2026-06-10T12:31:30.019Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate	2022
publishDateRange	2022
publishDateSort	2022
publisher	Department of Computer Science
publisherStr	Department of Computer Science
record_format	dspace
source_str	UCTD — University of Cape Town Open Access Repository
spelling	oai:open.uct.ac.za:11427/36423 Predicting household poverty with machine learning methods: the case of Malawi Chinyama, Francis Berman, Sonia computer science Poverty alleviation continues to be paramount for developing countries. This necessitates the need for poverty tracking tools to monitor progress towards this goal and effect timely interventions. One major way poverty has been tracked in Malawi is by carrying out integrated household surveys every five years to quantify poverty at local and national levels. However, such surveys have been documented as very expensive, tedious, and sparsely administered by many low- and middle-income countries. Therefore, this study looked at whether machinelearning models can be used on existing survey data to predict poor and non-poor households, and whether these models can predict poverty using a smaller number of features than those collected in integrated household surveys. This was achieved by comparing the performance of three off-the-shelf, open-source machinelearning classification algorithms namely Logistic Regression, Extra Gradient Boosting Machine and Light Gradient Boosting Machine, in correctly predicting poor and non-poor households from Malawi survey data. These supervised learning algorithms were trained using 10-fold cross-validation. The experiments were carried out on the full panel of features which represent all the questions asked in a household survey, as well as on smaller feature subsets. The Filter method and SHapley Additive exPlanations method were used to rank the importance of the features, and smaller data subsets were selected based on these rankings. The highest prediction accuracy achieved for the full panel data set of 486 features was 87%. When the Filter method rankings were used, the models' prediction accuracy dropped to 63% for the top 50 features subset. However, using the SHAP method rankings, the maximum prediction accuracy level was maintained and only dropped slightly to 86% with the top 50 feature subset; to 84% with the top 20 features; and 73% for the top 10 features. Area under the Curve, Receiver Operating Characteristic curve, recall, precision, F1 score, Matthews Correlation Coefficient and Cohen's Kappa scores confirmed the classification models' reliability. The study, therefore, established that poverty can be predicted by open-source machine learning algorithms using a substantially reduced number of features with accuracy comparable to using the full feature set. A policy recommendation is to employ only the top explanatory features in surveys. This will enable shorter, lower-cost surveys that can be administered more frequently. The aim is to assist policymakers and aid organisations to make more timely interventions with better targeting of the poorest. 2022-05-19T10:58:38Z 2022-05-19T10:58:38Z 2022 2022-05-19T10:58:06Z Master Thesis Masters MSc http://hdl.handle.net/11427/36423 eng application/pdf Department of Computer Science Faculty of Science
spellingShingle	computer science Chinyama, Francis Predicting household poverty with machine learning methods: the case of Malawi
thesis_degree_str	Master's
title	Predicting household poverty with machine learning methods: the case of Malawi
title_full	Predicting household poverty with machine learning methods: the case of Malawi
title_fullStr	Predicting household poverty with machine learning methods: the case of Malawi
title_full_unstemmed	Predicting household poverty with machine learning methods: the case of Malawi
title_short	Predicting household poverty with machine learning methods: the case of Malawi
title_sort	predicting household poverty with machine learning methods the case of malawi
topic	computer science
url	http://hdl.handle.net/11427/36423
work_keys_str_mv	AT chinyamafrancis predictinghouseholdpovertywithmachinelearningmethodsthecaseofmalawi

Full Text Available

Predicting household poverty with machine learning methods: the case of Malawi

Similar Items