Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
The advancement in data acquiring technology continues to see survival data sets with many covariates. This has posed a new challenge for researchers in identifying important covariates for inference and prediction for a time-to-event response variable. In this dissertation, common Cox proportional...
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Department of Statistical Sciences
2022
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613248118849536 |
|---|---|
| access_status_str | Open Access |
| author | Njati, Jolando |
| author2 | Gumedze, Freedom |
| author_browse | Gumedze, Freedom Njati, Jolando |
| author_facet | Gumedze, Freedom Njati, Jolando |
| author_sort | Njati, Jolando |
| collection | Thesis |
| description | The advancement in data acquiring technology continues to see survival data sets with many covariates. This has posed a new challenge for researchers in identifying important covariates for inference and prediction for a time-to-event response variable. In this dissertation, common Cox proportional hazards model selection techniques and a random survival forest technique were compared using five performance criteria measures. These performance measures were concordance index, integrated area under the curve, and , and R2 . To carry out this exercise, a multicentre clinical trial data set was used. A simulation study was also implemented for this comparison. To develop a Cox proportional model, a training dataset of 75% of the observations was used and the model selection techniques were implemented to select covariates. Full Cox PH models containing all covariates were also incorporated for analysis for both the clinical trial data set and simulations. The clinical trial data set showed that the full model and forward selection technique performed better with the performance metrics employed, though they do not reduce the complexity of the model as much as the Lasso technique does. The simulation studies also showed that the full model performed better than the other techniques, with the Lasso technique overpenalising the model from the simulation with the smaller data set and many covariates. AIC and BIC were less effective in computation than the rest of the variable selection techniques, but effectively reduced model complexity than their counterparts for the simulations. The integrated area under the curve was the performance metric of choice for choosing the final model for analysis on the real data set. This performance metric gave more efficient outcomes unlike the other metrics on all selection techniques. This dissertation hence showed that variable selection techniques differ according to the study design of the research as well as the performance measure used. Hence, to have a good model, it is important to not use a model selection technique in isolation. There is therefore need for further research and publish techniques that work generally well for different study designs to make the process shorter for most researchers. |
| format | Thesis |
| id | oai:open.uct.ac.za:11427/36594 |
| institution | University of Cape Town (South Africa) |
| language | eng |
| last_indexed | 2026-06-10T12:33:07.122Z |
| license_str | Not specified — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository |
| publishDate | 2022 |
| publishDateRange | 2022 |
| publishDateSort | 2022 |
| publisher | Department of Statistical Sciences |
| publisherStr | Department of Statistical Sciences |
| record_format | dspace |
| source_str | UCTD — University of Cape Town Open Access Repository |
| spelling | oai:open.uct.ac.za:11427/36594 Statistical model selection techniques for the cox proportional hazards model: a comparative study Njati, Jolando Gumedze, Freedom survival analysis simulation Cox proportional hazard model selection integrated area under the curve The advancement in data acquiring technology continues to see survival data sets with many covariates. This has posed a new challenge for researchers in identifying important covariates for inference and prediction for a time-to-event response variable. In this dissertation, common Cox proportional hazards model selection techniques and a random survival forest technique were compared using five performance criteria measures. These performance measures were concordance index, integrated area under the curve, and , and R2 . To carry out this exercise, a multicentre clinical trial data set was used. A simulation study was also implemented for this comparison. To develop a Cox proportional model, a training dataset of 75% of the observations was used and the model selection techniques were implemented to select covariates. Full Cox PH models containing all covariates were also incorporated for analysis for both the clinical trial data set and simulations. The clinical trial data set showed that the full model and forward selection technique performed better with the performance metrics employed, though they do not reduce the complexity of the model as much as the Lasso technique does. The simulation studies also showed that the full model performed better than the other techniques, with the Lasso technique overpenalising the model from the simulation with the smaller data set and many covariates. AIC and BIC were less effective in computation than the rest of the variable selection techniques, but effectively reduced model complexity than their counterparts for the simulations. The integrated area under the curve was the performance metric of choice for choosing the final model for analysis on the real data set. This performance metric gave more efficient outcomes unlike the other metrics on all selection techniques. This dissertation hence showed that variable selection techniques differ according to the study design of the research as well as the performance measure used. Hence, to have a good model, it is important to not use a model selection technique in isolation. There is therefore need for further research and publish techniques that work generally well for different study designs to make the process shorter for most researchers. 2022-07-01T15:26:47Z 2022-07-01T15:26:47Z 2022 2022-07-01T15:24:00Z Master Thesis Masters MSc http://hdl.handle.net/11427/36594 eng application/pdf Department of Statistical Sciences Faculty of Science |
| spellingShingle | survival analysis simulation Cox proportional hazard model selection integrated area under the curve Njati, Jolando Statistical model selection techniques for the cox proportional hazards model: a comparative study |
| thesis_degree_str | Master's |
| title | Statistical model selection techniques for the cox proportional hazards model: a comparative study |
| title_full | Statistical model selection techniques for the cox proportional hazards model: a comparative study |
| title_fullStr | Statistical model selection techniques for the cox proportional hazards model: a comparative study |
| title_full_unstemmed | Statistical model selection techniques for the cox proportional hazards model: a comparative study |
| title_short | Statistical model selection techniques for the cox proportional hazards model: a comparative study |
| title_sort | statistical model selection techniques for the cox proportional hazards model a comparative study |
| topic | survival analysis simulation Cox proportional hazard model selection integrated area under the curve |
| url | http://hdl.handle.net/11427/36594 |
| work_keys_str_mv | AT njatijolando statisticalmodelselectiontechniquesforthecoxproportionalhazardsmodelacomparativestudy |