Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Data mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed...
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Department of Computer Science
2019
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867611311238545408 |
|---|---|
| access_status_str | Open Access |
| author | Smith, Zach |
| author2 | Berman, Sonia |
| author_browse | Berman, Sonia Smith, Zach |
| author_facet | Berman, Sonia Smith, Zach |
| author_sort | Smith, Zach |
| collection | Thesis |
| description | Data mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed. This project shows how join and aggregation can be achieved across entity boundaries in such systems, as encountered for example in the pre-processing and exploration stages of educational data mining. A software stack is presented as a means by which this can be achieved; first, datasets are processed via ETL operations, then MapReduce is used to create indices of ordered and aggregated data. Finally, a Couchdb list function is used to iterate through these indices and perform joins, and to compute aggregated values on joined datasets such as variance and correlations. In terms of the case study, it is shown that the proposed approach to implementing cross-document joins and aggregation is effective and scalable. In addition, it was discovered that for the 2014 - 2016 UCT cohorts, NBT scores correlate better with final grades for the CSC1015F course than do Grade 12 results for English, Science and Mathematics. |
| format | Thesis |
| id | oai:open.uct.ac.za:11427/29530 |
| institution | University of Cape Town (South Africa) |
| language | eng |
| license_str | Not specified — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository |
| publishDate | 2019 |
| publishDateRange | 2019 |
| publishDateSort | 2019 |
| publisher | Department of Computer Science |
| publisherStr | Department of Computer Science |
| record_format | dspace |
| source_str | UCTD — University of Cape Town Open Access Repository |
| spelling | oai:open.uct.ac.za:11427/29530 Joining and aggregating datasets using CouchDB Smith, Zach Berman, Sonia Computer Science Data mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed. This project shows how join and aggregation can be achieved across entity boundaries in such systems, as encountered for example in the pre-processing and exploration stages of educational data mining. A software stack is presented as a means by which this can be achieved; first, datasets are processed via ETL operations, then MapReduce is used to create indices of ordered and aggregated data. Finally, a Couchdb list function is used to iterate through these indices and perform joins, and to compute aggregated values on joined datasets such as variance and correlations. In terms of the case study, it is shown that the proposed approach to implementing cross-document joins and aggregation is effective and scalable. In addition, it was discovered that for the 2014 - 2016 UCT cohorts, NBT scores correlate better with final grades for the CSC1015F course than do Grade 12 results for English, Science and Mathematics. 2019-02-14T13:15:50Z 2019-02-14T13:15:50Z 2018 2019-02-14T11:26:06Z Master Thesis Masters MSc http://hdl.handle.net/11427/29530 eng application/pdf Department of Computer Science Faculty of Science University of Cape Town |
| spellingShingle | Computer Science Smith, Zach Joining and aggregating datasets using CouchDB |
| thesis_degree_str | Master's |
| title | Joining and aggregating datasets using CouchDB |
| title_full | Joining and aggregating datasets using CouchDB |
| title_fullStr | Joining and aggregating datasets using CouchDB |
| title_full_unstemmed | Joining and aggregating datasets using CouchDB |
| title_short | Joining and aggregating datasets using CouchDB |
| title_sort | joining and aggregating datasets using couchdb |
| topic | Computer Science |
| url | http://hdl.handle.net/11427/29530 |
| work_keys_str_mv | AT smithzach joiningandaggregatingdatasetsusingcouchdb |