Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Hospital readmission prediction with long clinical notes

Electronic health records (EHR) data is captured across many healthcare institutions, resulting in large amounts of diverse information that can be analysed for diagnosis, prognosis, treatment and prevention of disease. One type of data captured by EHRs are clinical notes, which are unstructured dat...

Full description

Saved in:
Bibliographic Details
Main Author: Nurmahomed, Yassin
Other Authors: Buys, Jan
Format: Thesis
Language:English
Published: Department of Computer Science 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613386250911746
access_status_str Open Access
author Nurmahomed, Yassin
author2 Buys, Jan
author_browse Buys, Jan
Nurmahomed, Yassin
author_facet Buys, Jan
Nurmahomed, Yassin
author_sort Nurmahomed, Yassin
collection Thesis
description Electronic health records (EHR) data is captured across many healthcare institutions, resulting in large amounts of diverse information that can be analysed for diagnosis, prognosis, treatment and prevention of disease. One type of data captured by EHRs are clinical notes, which are unstructured data written in natural language. We can leverage Natural Language Processing (NLP) to build machine learning (ML) models to gain understanding from clinical notes that will enable us to predict clinical outcomes. ClinicalBERT is a pre-trained Transformer based model which is trained on clinical text and is able to predict 30-day hospital readmission from clinical notes. Although the performance is good, it suffers from a limitation on the size of the text sequence that is fed as input to the model. Models using longer sequences have been shown to perform better on different ML tasks, even with clinical text. In this work, a ML model called Longformer which pre-trained then fine-tuned on clinical text and is able to learn from longer sequences than previous models is evaluated. Performance is evaluated against the Deep Averaging Network (DAN) and Long short-term memory (LSTM) baselines and previous state-of-the-art models in terms of Area under the receiver operating characteristic curve (AUROC), Area under the precision-recall curve (AUPRC) and Recall at precision of 70% (RP70). Longformer is able to best ClinicalBERT on two performance metrics, however it is not able to surpass one of the baselines in any of the metrics. Training the model on early notes did not result in substantial difference when compared to training on discharge summaries. Our analysis shows that the model suffers from out-of-vocabulary words, as many biomedical concepts are missing from the original pre-training corpus.
format Thesis
id oai:open.uct.ac.za:11427/37712
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:35:19.444Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2023
publishDateRange 2023
publishDateSort 2023
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/37712 Hospital readmission prediction with long clinical notes Nurmahomed, Yassin Buys, Jan Computer Science Electronic health records (EHR) data is captured across many healthcare institutions, resulting in large amounts of diverse information that can be analysed for diagnosis, prognosis, treatment and prevention of disease. One type of data captured by EHRs are clinical notes, which are unstructured data written in natural language. We can leverage Natural Language Processing (NLP) to build machine learning (ML) models to gain understanding from clinical notes that will enable us to predict clinical outcomes. ClinicalBERT is a pre-trained Transformer based model which is trained on clinical text and is able to predict 30-day hospital readmission from clinical notes. Although the performance is good, it suffers from a limitation on the size of the text sequence that is fed as input to the model. Models using longer sequences have been shown to perform better on different ML tasks, even with clinical text. In this work, a ML model called Longformer which pre-trained then fine-tuned on clinical text and is able to learn from longer sequences than previous models is evaluated. Performance is evaluated against the Deep Averaging Network (DAN) and Long short-term memory (LSTM) baselines and previous state-of-the-art models in terms of Area under the receiver operating characteristic curve (AUROC), Area under the precision-recall curve (AUPRC) and Recall at precision of 70% (RP70). Longformer is able to best ClinicalBERT on two performance metrics, however it is not able to surpass one of the baselines in any of the metrics. Training the model on early notes did not result in substantial difference when compared to training on discharge summaries. Our analysis shows that the model suffers from out-of-vocabulary words, as many biomedical concepts are missing from the original pre-training corpus. 2023-04-13T10:49:24Z 2023-04-13T10:49:24Z 2022 2023-04-12T11:29:48Z Master Thesis Masters MSc http://hdl.handle.net/11427/37712 eng application/pdf Department of Computer Science Faculty of Science
spellingShingle Computer Science
Nurmahomed, Yassin
Hospital readmission prediction with long clinical notes
thesis_degree_str Master's
title Hospital readmission prediction with long clinical notes
title_full Hospital readmission prediction with long clinical notes
title_fullStr Hospital readmission prediction with long clinical notes
title_full_unstemmed Hospital readmission prediction with long clinical notes
title_short Hospital readmission prediction with long clinical notes
title_sort hospital readmission prediction with long clinical notes
topic Computer Science
url http://hdl.handle.net/11427/37712
work_keys_str_mv AT nurmahomedyassin hospitalreadmissionpredictionwithlongclinicalnotes