Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Using recency, frequency and monetary variables to predict customer lifetime value with XGBoost

CRM) will continue to gain prominence in the coming years. A commonly used CRM metric called Customer Lifetime Value (CLV) is the value a customer will contribute while they are an active customer. This study investigated the ability of supervised machine learning models constructed with XGBoost to...

Full description

Saved in:
Bibliographic Details
Main Author: Myburg, Marius Errol
Other Authors: Berman, Sonia
Format: Thesis
Language:English
Published: Department of Computer Science 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613172189364224
access_status_str Open Access
author Myburg, Marius Errol
author2 Berman, Sonia
author_browse Berman, Sonia
Myburg, Marius Errol
author_facet Berman, Sonia
Myburg, Marius Errol
author_sort Myburg, Marius Errol
collection Thesis
description CRM) will continue to gain prominence in the coming years. A commonly used CRM metric called Customer Lifetime Value (CLV) is the value a customer will contribute while they are an active customer. This study investigated the ability of supervised machine learning models constructed with XGBoost to predict future CLV, as well as the likelihood that a customer will drop to a lower CLV in the future. One approach to determining CLV, called the RFM method, is done by isolating recency (R), frequency (F) and (M) monetary values. The produced models used these RFM variables and also assessed if including temporal, product, and other customer transaction information assisted the XGBoost classifier in making better predictions. The classification models were constructed by extracting each customer's RFM values and transaction information from a Fast Mover Consumer Goods dataset. Different variations of CLV were calculated through one- and two-dimensional K-means clustering of the M (Monetary), F and M (Profitability), F and R (Loyalty), as well as the R and M (Burgeoning) variables. Two additional CLV variations were also determined by isolating the M tercile segments and a commonly used weighted-RFM approach. To test the effectiveness of XGBoost in predicting future timeframes, the dataset was divided into three consecutive periods, where the first period formed the features used to predict the target CLV variables in the second and third periods. Models that predicted if CLV dropped to a lower value from the first to the second and from the first to the third periods were also constructed. It was found that the XGBoost models were moderately to highly effective in classifying future CLV in both the second and third periods. The models also effectively predicted if CLV would drop to a lower value in both future periods. The ability to predict future CLV and CLV drop in the second period, was only slightly better than the ability to predict the future CLV in the third period. Models constructed by adding additional temporal, product, and customer transaction information to the RFM values did not improve on those created that used only the RFM values. These findings illustrate the effectiveness of XGBoost as a predictor for future CLV and CLV drop, as well as affirming the efficacy of utilising RFM values to determine future CLV.
format Thesis
id oai:open.uct.ac.za:11427/38088
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:31:54.917Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2023
publishDateRange 2023
publishDateSort 2023
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/38088 Using recency, frequency and monetary variables to predict customer lifetime value with XGBoost Myburg, Marius Errol Berman, Sonia computer science CRM) will continue to gain prominence in the coming years. A commonly used CRM metric called Customer Lifetime Value (CLV) is the value a customer will contribute while they are an active customer. This study investigated the ability of supervised machine learning models constructed with XGBoost to predict future CLV, as well as the likelihood that a customer will drop to a lower CLV in the future. One approach to determining CLV, called the RFM method, is done by isolating recency (R), frequency (F) and (M) monetary values. The produced models used these RFM variables and also assessed if including temporal, product, and other customer transaction information assisted the XGBoost classifier in making better predictions. The classification models were constructed by extracting each customer's RFM values and transaction information from a Fast Mover Consumer Goods dataset. Different variations of CLV were calculated through one- and two-dimensional K-means clustering of the M (Monetary), F and M (Profitability), F and R (Loyalty), as well as the R and M (Burgeoning) variables. Two additional CLV variations were also determined by isolating the M tercile segments and a commonly used weighted-RFM approach. To test the effectiveness of XGBoost in predicting future timeframes, the dataset was divided into three consecutive periods, where the first period formed the features used to predict the target CLV variables in the second and third periods. Models that predicted if CLV dropped to a lower value from the first to the second and from the first to the third periods were also constructed. It was found that the XGBoost models were moderately to highly effective in classifying future CLV in both the second and third periods. The models also effectively predicted if CLV would drop to a lower value in both future periods. The ability to predict future CLV and CLV drop in the second period, was only slightly better than the ability to predict the future CLV in the third period. Models constructed by adding additional temporal, product, and customer transaction information to the RFM values did not improve on those created that used only the RFM values. These findings illustrate the effectiveness of XGBoost as a predictor for future CLV and CLV drop, as well as affirming the efficacy of utilising RFM values to determine future CLV. 2023-07-12T10:20:29Z 2023-07-12T10:20:29Z 2023 2023-07-12T10:16:39Z Master Thesis Masters MSc http://hdl.handle.net/11427/38088 eng application/pdf Department of Computer Science Faculty of Science
spellingShingle computer science
Myburg, Marius Errol
Using recency, frequency and monetary variables to predict customer lifetime value with XGBoost
thesis_degree_str Master's
title Using recency, frequency and monetary variables to predict customer lifetime value with XGBoost
title_full Using recency, frequency and monetary variables to predict customer lifetime value with XGBoost
title_fullStr Using recency, frequency and monetary variables to predict customer lifetime value with XGBoost
title_full_unstemmed Using recency, frequency and monetary variables to predict customer lifetime value with XGBoost
title_short Using recency, frequency and monetary variables to predict customer lifetime value with XGBoost
title_sort using recency frequency and monetary variables to predict customer lifetime value with xgboost
topic computer science
url http://hdl.handle.net/11427/38088
work_keys_str_mv AT myburgmariuserrol usingrecencyfrequencyandmonetaryvariablestopredictcustomerlifetimevaluewithxgboost