Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
This research addresses the dual challenges of improving credit scorecard accuracy and maintaining interpretability. While machine learning algorithms like random forest and eXtreme gradient boosting outperform traditional logistic regression in accuracy, their complex predictor variable representat...
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English English |
| Published: |
Graduate School of Business (GSB)
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613492584906752 |
|---|---|
| access_status_str | Open Access |
| author | Hlongwane, Rivalani |
| author2 | Ramaboa, Kutlwano |
| author_browse | Hlongwane, Rivalani Ramaboa, Kutlwano |
| author_facet | Ramaboa, Kutlwano Hlongwane, Rivalani |
| author_sort | Hlongwane, Rivalani |
| collection | Thesis |
| description | This research addresses the dual challenges of improving credit scorecard accuracy and maintaining interpretability. While machine learning algorithms like random forest and eXtreme gradient boosting outperform traditional logistic regression in accuracy, their complex predictor variable representation hinders interpretability. To reconcile this, the study discretizes numerical variables, applies one-hot encoding, and employs Shapley values to derive interpretable credit scores for random forest, eXtreme gradient boosting, light gradient boosting machine, and categorical boosting models. This approach produces credit scorecards that align with industry standards. Additionally, the investigation into the role of alternative data in credit scoring reveals its impact on model accuracy. By analysing unique predictor variables such as an applicant's social circle default status, regional ratings, and local population size, the significance of alternative data is demonstrated. Leveraging the model-X knockoffs framework for predictor variable selection contributes to superior model performance, achieving the highest area under the curve on the Kaggle home credit data. |
| format | Thesis |
| id | oai:open.uct.ac.za:11427/41623 |
| institution | University of Cape Town (South Africa) |
| language | English eng |
| last_indexed | 2026-06-10T12:37:00.852Z |
| license_str | Not specified — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Graduate School of Business (GSB) |
| publisherStr | Graduate School of Business (GSB) |
| record_format | dspace |
| source_str | UCTD — University of Cape Town Open Access Repository |
| spelling | oai:open.uct.ac.za:11427/41623 Credit scorecards in retail banking: enhancing interpretability through shapley values and evaluating the effectiveness of alternative data for improved accuracy Hlongwane, Rivalani Ramaboa, Kutlwano credit scorecard This research addresses the dual challenges of improving credit scorecard accuracy and maintaining interpretability. While machine learning algorithms like random forest and eXtreme gradient boosting outperform traditional logistic regression in accuracy, their complex predictor variable representation hinders interpretability. To reconcile this, the study discretizes numerical variables, applies one-hot encoding, and employs Shapley values to derive interpretable credit scores for random forest, eXtreme gradient boosting, light gradient boosting machine, and categorical boosting models. This approach produces credit scorecards that align with industry standards. Additionally, the investigation into the role of alternative data in credit scoring reveals its impact on model accuracy. By analysing unique predictor variables such as an applicant's social circle default status, regional ratings, and local population size, the significance of alternative data is demonstrated. Leveraging the model-X knockoffs framework for predictor variable selection contributes to superior model performance, achieving the highest area under the curve on the Kaggle home credit data. 2025-08-26T09:00:37Z 2025-08-26T09:00:37Z 2025 2025-08-26T08:57:52Z Thesis / Dissertation Doctoral PhD http://hdl.handle.net/11427/41623 en eng application/pdf Graduate School of Business (GSB) Faculty of Commerce University of Cape Town |
| spellingShingle | credit scorecard Hlongwane, Rivalani Credit scorecards in retail banking: enhancing interpretability through shapley values and evaluating the effectiveness of alternative data for improved accuracy |
| thesis_degree_str | Doctoral |
| title | Credit scorecards in retail banking: enhancing interpretability through shapley values and evaluating the effectiveness of alternative data for improved accuracy |
| title_full | Credit scorecards in retail banking: enhancing interpretability through shapley values and evaluating the effectiveness of alternative data for improved accuracy |
| title_fullStr | Credit scorecards in retail banking: enhancing interpretability through shapley values and evaluating the effectiveness of alternative data for improved accuracy |
| title_full_unstemmed | Credit scorecards in retail banking: enhancing interpretability through shapley values and evaluating the effectiveness of alternative data for improved accuracy |
| title_short | Credit scorecards in retail banking: enhancing interpretability through shapley values and evaluating the effectiveness of alternative data for improved accuracy |
| title_sort | credit scorecards in retail banking enhancing interpretability through shapley values and evaluating the effectiveness of alternative data for improved accuracy |
| topic | credit scorecard |
| url | http://hdl.handle.net/11427/41623 |
| work_keys_str_mv | AT hlongwanerivalani creditscorecardsinretailbankingenhancinginterpretabilitythroughshapleyvaluesandevaluatingtheeffectivenessofalternativedataforimprovedaccuracy |