Search

Home > Study > Postgraduate Programmes > Masters Programmes

Masters Programmes

Masters Programmes

The Department of Statistical Sciences offers four masters programmes:

  • Masters in Data Science (STA5080W & AST5005H/ IBS5004H/ CSC5009H/ PHY5008H/ STA5079H)
  • Masters in Advanced Analytics and Decision Sciences by course work and half dissertation (STA5003W & STA5004W)
  • Masters in Biostatistics (STA5057W & STA5058W)
  • Masters in Mathematical Statistics by dissertation only (STA5000W)
  • Masters in Operational Research by dissertation only (STA5001W)
  • Masters in Ecological/Environmental Statistics by dissertation only (STA5013W)

STA5080W: Masters in Data Science

This is an interdisciplinary degree with participating departments: Statistical Sciences, Computer Science, Astronomy, Physics, and the Computational Biology group (Health Sciences Faculty).  This degree is aimed at students who hold a good honours degree but who do not have advanced background in Statistics and Computer Science although they have been exposed to mathematics and computing during their undergraduate studies.  Students will learn the statistical and computing skills required to deal with Big Data from Astronomy, Physics, Medicine and Commerce.  This masters degree is composed of two equally weighted components.  STA5080W is the coursework component (90 credits), followed by a 50% dissertation (90 credits) on a selected research topic in one of the following: Data Science in Astronomy (AST5005H), Data Science in Bioinformatics (IBS5004H), Data Science in Computer Science (CSC5009H), Data Science in Physics (PHY5008H) or Data Science in Statistical Sciences (STA5079H).  The degree will be open to students with at least 65% for an honours degree in any discipline that involved a substantial component of quantitative and computing training, as assess by a selection committee made up of representatives from the contributing departments.  The successful completion of pre-courses as deemed necessary by the selection committee might be required (STA5014Z) before being allowed to register for the degree.  Students will be required to pass 5 compulsory and 2 elective modules.  The overall mark for the coursework component will be a weighted average (based on contribution towards total credit count) of the marks obtained for the individual modules.  Students will be required to pass each individual module in order to pass the coursework component of the degree.  The following core modules are compulsory:

Databases for Data Scientists CSC5007Z 12 credits
Statistical and High Performance Computing STA5075Z 12 credits
Data Visualization  CSC5008Z 12 credits
Unsupervised Learning STA5077Z 12 credits
Supervised Learning STA5076Z 18 credits

In order to complete 90 credits, students can choose from the following elective modules although not all modules will be offered every year; modules offered will depend on staff availability and the course will be tailored to the interests and needs of the particular students.

Data Science for Astronomy AST5004Z 12 credits
Data Science for Particle Physics      PHY5007Z 12 credits
Bioinformatics for high-throughput biology IBS5003Z 15 credits
Data Science for Industry      STA5073Z 12 credits
Decision Modelling for Prescriptive Analytics STA5074Z 12 credits
Bayesian Decision Modelling STA5061Z 15 credits

Any other masters modules in Statistical Sciences or Computer Science. Specific entry requirements might apply to these modules.


STA5003W & STA5004W: Masters in Advanced Analytics and Decision Sciences

This masters degree is composed of two equally weighted components. STA5003W is the course work component, followed by STA5004W as a 50% dissertation on a selected research topic. The degree will be open to both students with a Statistics honours degree (with a mark of at least 65%) and students who may not have a statistics background but holds a honours degree (with a mark of at least 65%) in another discipline, that contained a substantial component of quantitative training, as assessed by the head of the Department of Statistical Sciences. Students who do not have a qualification in Statistics at honours level, will be required to pass some pre-courses (STA5014Z) before being allowed to register for the degree.

STA5003W requires the student to complete 6 modules of 15 credits each. Students need to complete the core modules for their specialisation and elective modules with the option of completing up to two specialisation modules in a different department or honours level courses.  The following modules form part of the masters programme although not all modules will be offered every year; modules offered will depend on staff availability and the course will be tailored to the interests and needs of the particular students. The specialisation modules allows for streaming in the popular areas of Statistics and its applications: Data Science, Mathematical Statistics, Biological Statistics, Financial Statistics and Operations Research.

General degree: No Specialization

Simulation and Optimisation STA5071Z 15 credits
Advanced Topics in Regression STA5080Z 15 credits
Multivariate Statistics STA5069Z 15 credits

Financial Specialization

Simulation and Optimisation STA5071Z 15 credits
Financial Econometrics STA5065Z 15 credits
Advanced Portforlio STA5086Z 15 credits

Operational Research Specialization

Simulation and Optimisation STA5071Z 15 credits
Problem Structuring and Systems Dynamics STA5070Z 15 credits

Advanced Analytics Specialization

Simulation and Optimisation STA5071Z 15 credits
Advanced Topics Regression STA5090Z 15 credits
Multivariate Statistics STA5069Z 15 credits
Machine Learning STA5068Z 15 credits
Python Programming (offered by Computer Science Department   15 credits
Database Management(offered by Computer Science Department   15 credits

Elective Modules:

Advanced Topics in Regression STA5090Z 15 credits
Multivariate Statistics STA5069Z 15 credits
Financial Econometrics  STA5065Z 15 credits
Advanced Portfolio Theory                                        STA5086Z 15 credits
Problem Structuring and System Dynamics STA5070Z  15 credits
Machine Learning STA5068Z  15 credits
Bayesian Decision Modelling STA5061Z 15 credits
Causal Modelling  STA5062Z 15 credits
 Design of Clinical Trials                                           STA5063Z 15 credits
Ecological Statistics STA5064Z 15 credits
Mathematical Modelling for Infectious Diseases STA5066Z 15 credits
Longitudinal Data Analysis  STA5067Z 15 credits
Survival Analysis STA5072Z 15 credits
Bioinformatics for high-throughput biology IBS5003Z 15 credits

STA5057W & STA5058W: Masters in Biostatistics

The course work component (STA5057W) of the MSc degree in Biostatistics aims to train students in more advanced statistical methodology needed for the analysis of data from the Health and Biological Sciences. Students need to complete 6 modules of which 4 are compulsory. Depending on the statistics background of the student, two options are available:

Option 1:

Multivariate Statistics  STA5069Z 15 credits
Longitudinal Data Analysis STA5067Z 15 credits
Survival Analysis STA5072Z 15 credits
Design of Clinical Trials STA5063Z 15 credits

Option 2

Biostatistics Honours Module    15 credits
GLM Theory    7.5 credits
Multivariate Honours Module   15 credits
Longitudinal Data Analysis STA5067Z 15 credits
Survial Analysis STA5072Z 15 credits
Design of Clinical Trials STA5063Z 15 credits

To complete a total of 90 credits, students may choose from the electives:

Advanced Topics in Regression STA5090Z 15 credits
Simulation and Optimisation STA5071Z 15 credits
Machine Learning STA5068Z 15 credits
Bayesian Decision Modelling STA5061Z 15 credits
Mathematical Modelling for Infectious Diseases STA5066Z 15 credits
Ecological Statistics  STA5064Z  15 credits
Causal Modelling STA5062Z 15 credits

Students are also allowed to choose a maximum of two modules from other departments or honours level courses. Students who do not have a qualification in Statistics at honours level, will be required to pass some pre-courses (STA5014Z) before being allowed to register for the degree. These include Introductory Calculus, Matrix Methods, Introductory Inference and R-programming. These transition students will be required to take the Biostatistics Honours module as a compulsory module and may take the Multivariate Statistics at honours rather than masters level. After successful completion of the coursework component, the students will register for the dissertation component of the degree.

The research component (STA5058W) of the degree is based on a 90 credit dissertation. The topic of the research will be based on methodological or applied problems from the Health or Biological Sciences. Students may be based in a research unit from where the research has originated for the duration of their dissertation.

On completion of the research component, and the preceding coursework component, students will be able to: (1) conduct collaborative research in the health and biological sciences, (2) conduct independent research in statistical methodology for the health and biological sciences, (3) act as statistical consultants for health and biological sciences research.


STA5000W: Masters in Mathematical Statistics

The topic of the masters degree is decided in conjunction with a supervisor. Although every effort will be made to link potential students with a supervisor in the field of the submitted research proposal, it remains the responsibility of the applicant to secure a commitment from a suitable supervisor. The research fields of our staff vary in the areas of Astrostatistics, Biostatistics and Bioinformatics, Ecological statistics, Econometrics and Financial modelling, Multivariate statistics, Stochastic processes, Spatial statistics and Statistical Education. For more information on the specific specialisations of staff members, see the Academic staff page.

STA5001W: Masters in Operational Research

The topic of the masters degree is decided in conjunction with a supervisor. Although every effort will be made to link potential students with a supervisor in the field of the submitted research proposal, it remains the responsibility of the applicant to secure a commitment from a suitable supervisor. The research fields of our staff are focused mainly around Decision modelling, Problem structuring and Project management.For more information on the specific specialisations of staff members, see the Academic staff page.

STA5013W: Masters in Ecological/Environmental Statistics

The topic of the masters degree is decided in conjunction with a supervisor. Although every effort will be made to link potential students with a supervisor in the field of the submitted research proposal, it remains the responsibility of the applicant to secure a commitment from a suitable supervisor. This degree is intended to allow students to do research at masters level in the combined fields of biology, ecology, the environment and statistics. The thesis should represent an in depth quantitative analysis of biological, environmental or ecological data. For more information on the specific specialisations of staff members, see the Academic staff page.

STA5059Z: Topics in Biostatistics A

The aim of this module is to allow occasional students and students registered for other degree programs to register for a single module that forms part of the MSc in Biostatistics. Possible modules include Multivariate Statistics, Longitudinal Data Analysis, Survival Analysis and Design and Analysis of Experiments in the Health Sciences, Advanced Topics in Regression, Simulation and Optimisation, Machine Learning, Bayesian Decision Analysis, Infectious Disease Modelling, Ecological Statistics and Structural Equation Modelling

STA5060Z: Topics in Biostatistics B

The aim of this module is to allow occasional students and students registered for other degree programs to register for a single module that forms part of the MSc in Biostatistics. Possible modules include Multivariate Statistics, Longitudinal Data Analysis, Survival Analysis and Design and Analysis of Experiments in the Health Sciences, Advanced Topics in Regression, Simulation and Optimisation, Machine Learning, Bayesian Decision Analysis, Infectious Disease Modelling, Ecological Statistics and Structural Equation Modelling

STA5005H: Special Topics in Statistics B

The course code for Special Topics in Statistics B allows students from other departments or faculties to register for a 15 credit module in the Department of Statistical Sciences. Acceptance into any of the specialisation modules will be based on the individual academic background of the student.


Entrance requirements

A relevant Honours degree or equivalent (equivalent to a UCT Hons degree), with at least a good 2nd class pass (above 65%).

Application procedure

Application to the department is facilitated by sending an e-mail containing the following to Ms Celene Jansen-Fielies (celene.jansen-fielies@uct.ac.za)

  • Complete the expression of interest form
  • Full academic transcripts of all courses, not completed at UCT
  • 2 page CV
  • Closing date for applications for STA5003W, STA5057W and STA5080W is 31 October for potential registration in February of the next year. Note that lectures start 2 weeks prior to the start of the undergraduate academic year.

For STA5000W, STA5001W or STA5013W:

  • A one-page research proposal
  • Students are welcome to initiate the application process at any time during the academic year, although registration usually takes place in February or July.

Once the department has indicated provisional acceptance into the masters programme, official application is to the Science Faculty by completing the Online application form.

Finance

You need to ensure sufficient funds to cover your fees and living expenses. A limited number of university bursaries and other bursaries are available. You need to apply separately for such funding (http://www.uct.ac.za/apply/funding/postgraduate/applications/). A limited number of tutoring positions are available in the department. The salary would depend on your duties and typically provides not more than R1500 per month for eight or nine months of the year. Note that an offer/acceptance into a postgraduate programme does not automatically ensure or entitle you to a tutorship. The department does not offer any financial assistance to students and it is imperative that students ensure coverage of their own financial needs before they arrive at UCT.

Language requirements

The official language of the university is English. Students may be required to undertake an English proficiency test. For more information on postgraduate studies (application procedure, funding and rules) of UCT please consult: http://www.uct.ac.za/apply/applications/postgraduates Note that the department’s approval of your application is a requirement of registration, but the Faculty may have additional requirements.


Masters Modules

Summary of Masters Modules

Advanced Portfolio Theory (STA5086Z)

This course is intended to expose students to the more advanced topics in portfolio theory, portfolio management and risk management. Statistical techniques such as optimisation, simulation, spectral decomposition of the covariance matrix and robust optimisation are some of the techniques that will be utilised in the models. Notwithstanding, the emphasis in this course is on the practical application of the models and theories. There will thus be an emphasis on the quantification of these measures and parameterisation of models in a South African (and African) setting. Furthermore there will be a focus on the interpretation and linkages between the concepts.

Advanced Topics in Regression Analysis (STA5090Z)

In this module, basic regression concepts shall be examined before moving on to advanced methods that allow for more flexibility in modelling. Topics to be covered include Ordinary Least Squares Regression, Subset Selection, Shrinkage Methods, Principal Component Regression and Partial Least Squares Regression, Piecewise Polynomials, Smoothing Splines, Wavelet Smoothing, Kernel Smoothing Methods, Mixture Models and Generalised Additive Models.

Bayesian Decision Analysis (STA5061Z)

The aim is to provide the student with a broad background of the Bayesian approach to decision analysis and statistical inferences, addressing in particular:

  • The theoretical, philosophical and behavioural background of subjective probability and subjective expected utility (SEU);
  • The interpretation of this background for statistical inference, with examples from a variety of contexts;
  • The computational tools needed for implementing Bayesian statistical inference in practice;

The role of Bayesian networks for modelling of inference and decision making in complex systems.

Bioinformatics for high-throughout biology (IBS5003Z)

This course is aimed to introduce students to bioinformatics techiniques related to processing, analysis and interpretation of high-throughput biological data.  It will cover the analysis of next generation sequence data of different types (metagenomic, RNA-Seq and full genome); statistical analysis of NGS in relation to metadata associated with it; phylogenetic analysis of sequence data; and medical population genetics NGS or array data.  The students who complete the course will be skilled both in handling big biological data sets, and in their downstream interpretation.

Causal Modelling (STA5062Z)

To introduce students to the concept of causality, causal diagrams and causal modelling.  Topics to be covered include Counterfactual Theory, Directed Acycilcal Graphs, Propensity Scores, Inverse Probability Weighting, Marginal Structural Models, G-estimation, Path Analysis, Confirmatory Factor Analysis, Structural Equation Modelling (SEM), Multiple Group SEM, MIMIC (Multiple Indicator and Multiple Causes) Models, Multilevel SEM, and Latent Growth Curve SEM.  The course cover both the theory and the application of the methods with computer software such as R, STATA and LISREL.

Database for Data Scientists (CSC5007Z)

This course will introduce students with little or no prior experience to the three cornerstone database technologies for big data, namely relational, NoSQL and Hadoop ecosystems,  The course aims to give students an understanding of how data is organised and manipulated at large scale, and practical experience of the design and development of such databases using open source infrastructure.  The relational part will cover conceptual, logical and physical database design, including ER modelling and normalisation theory, as well as SQL coding and best practices for performance enhancement.  NoSQL database were developed for big data and semi-structed data applications where relational systems are too inefficient; all four types of NoSQL architecture will be introduced.  Distributed data processing is key in manipulating large data sets effectively.  The final section of the course will teach the popular Hadoop technologies for distributed data processing, such as MapReduce programming and the execution model of Apache Spark.

Data Science for Astronomy (AST5004Z)

This course introduce students to various aspects of data intensive astrophysics, ranging from data visualisation and complex databases, to advanced statistical tools for astronomical data analysis and computational astrophysics.  At the core of this module are examples in modern data-intensive astrophysics derived from the global data challenges around MeerKAT, the Square Lilometre Array (SKA), associated projects in radio astronomy, and other large multi-wavelength surveys.  Students will be introduced to the use of Bayesian statistics in astronomy, the complexity of visualising large date cubes, optimising database operations in the presence of multi-dimensional data, data mining and discovery tools, and the role of large-scale simulations to interpret the significance of astronomical observations.

Data Science for Industry (STA5073Z)

This course seeks to equip the student with the skills required for a career in Data Science within industry.  Topics covered include A/B Testing, Design of Experiments (which includes Randomization, Block Design and Replication), Natural Language Processing an Recommendation Systems.  It teaches students how to deal with non-standard datasets such as images, audio recordings and network graphs.

Data Science for Particle Physics (PHY5007Z)

This course introduces students to the important computational aspects of particle physics research.  Using examples from current research at the European Organization for Nuclear Research (CERN), the students are introduced to; the basic principles of particle physics, the Grid computing model employed by the Worldwide LHC Computing Grid (WLCG), the simulation of particle physics data, the ROOT data analysis tool used by all the large particle physics collaborations, the signal extraction and significance estimation techniques employed by the most recent particle discoveries including concepts like nuisance parameters and the look-elsewhere effect.

Data Visualization (CSC5008Z)

Visualization is the graphical representation of data with the goal of improving comprehension, communication, hypothesis generation and decision making.  This course aims to teach the principles of effective vascularization of large, multidimensional data sets.  We cover the field of visual thinking, outlining current understanding of human perception and demonstration how we can use this knowledge to create more effective data visualizations.

Decision Modelling for prescriptive analystics (STA5074Z)

This course aims to develop an understating of the role of formal (soft and hard; deterministic and stochastic) modelling in decision support and analysis, to develop understanding of the key technologies behind decision modelling for prescriptive analytics, and to introduce new tools and techniques for analysing data in new ways in order to improve decision making.

Design of Clinical Trials (STA5063Z)

This module will look at the Design of Clinical Trials. Concepts of randomisation, replication and blocking will be discussed.  Students will be introduced to the different phases, that is Phases I, II, III and IV of trial designs.  Specific designs which will also be covered include, inter alia, randomised trials, dose-escalation studies, cross-over trials, PK/PD studies, designs for survival studies and multi-centre trials.  The implications of the specific design for the analysis of the data will be discussed.

Ecological Statistics (STA5064Z

This module will cover the latest statistical methods particular to ecological statistics. Topics to be covered include Capture-Mark-Recapture Models (Closed and Open Populations, Multi-state Models), Distance Sampling, Occupancy Models and State-Space Models in Ecology.

Financial Econometrics (STA5065Z)

This module comprises an advanced econometric and quantitative perspective of the following key areas: Market efficiency in macro-economic markets including the JSE, bond market ans short-term interest rate markets; Characteristics of the JSE and it's sectors; appropriate return transformations, the notion of company specific, sector specific and market wide effect; Special focus on the R$ exchange rate; its's effect on local markets (JSE and bond); causes of changes and modelling the impact on inflation; Technical modelling of bond market (Nelson-Siegel parameterisation) and the share market (Black Scholes; derivatives)

Longitudinal Data Analysis (STA5067Z)

This module will look at the latest methods for the analysis of longitudinal data. Longitudinal data arise as a result of repeated measurements of the same variable for a single observational unit. This lead to complex data structures that need to be taken into account, to decisions as to the particular hypothesis of interest, to a consideration of appropriate functional forms to characterize the longitudinal profile, and to potentially complex problems arising as a result of missing data. Topics to be covered include: Introduction to longitudinal data and linear mixed effect models; Generalized Estimating Equations, Generalized linear mixed effect models, Nonlinear Mixed Effect Models, including PK/PD modelling and Growth Curve modeling, Smoothing Spline Models, Missing Data, Casual Models.

Machine Learning (STA5068Z)

This course is highly recommended for those who wish to pursue a career in Data Science. The course serves as an overview of the increasingly important field of Machine Learning. An introduction is given to the key concepts, goals and terminology of Machine Learning. Subsequently, the lectures cover some basic theory and techniques that can serve to guide the analysis of large datasets. The implementation of some popular learning algorithms is examined. This includes Neural Networks, Support Vector Machines, Boosting and Random Forests. Throughout the course, comparisons and contrasts are made with traditional statistical practice.

For students wishing to specialise in Data Science, it is recommended that one also takes the “Programming in Python” and “Database System” modules of the Masters in Information Technology offered by the Computer Science department.

Mathematical Modelling of Infectious Disease Modelling (STA5066Z)

Infectious diseases remain a leading cause of morbidity and mortality worldwide, with HIV, tuberculosis and malaria estimated to cause 10% of all deaths each year. Mathematical models are being increasingly used to understand the transmission of infections and to evaluate the potential impact of control programmes in reducing morbidity and mortality. Applications include determining optimal control strategies against new or emergent infections, such as swine flu or Ebola, or against HIV, tuberculosis and malaria, and predicting the impact of vaccination strategies against common infections such as measles and rubella. This course will cover introductory and advanced concepts in mathematical modelling including deterministic and stochastic models, individual based models, and spatial models. Concepts covered include model building, equilibrium analysis, data fitting, sensitivity analysis and an introduction to health economics modelling. This course will cover introductory and advanced concepts in mathematical modelling including differential equation modelling, agent based modelling, computer simulation, statistical data fitting, public health modelling, introduction to economic modelling.

Multivariate Statistics (STA5069Z)

In this module, multivariate statistical analysis methods with associated graphical representations will be discussed. Topics to be covered include Principal Component Analysis and PCA biplots, Simple and Multiple Correspondence Analysis, Multidimensional Scaling, Cluster Analysis, Discriminant Analysis, Canonical Variate Analysis, Analysis of Distance.

Problem Structuring and System Dynamics (STA5070Z)

Problem Structuring:  This section aims to explore a number of tools and methods which support the initial phases of a process of enquiry or analysis.  Our interest is in understanding both the epistemological basis of different approaches as well as evaluating the extent to which they add rigour and promote insight.  We will be critiquing the efficacy of different approaches through a variety of case studies.  System Dynamics:  This section extends qualitative systems understanding to more formal and quantified computer-based models that can be used in a simulation mode.  The purpose is to understand system effects of complexities such as feedback loops, and to integrate softer subjective insights into quantitative models to explore potential effects.

Simulation and Optimisation (STA5071Z)

This course provides a thorough introduction to simulation and optimization methods used to solve both statistical and broader decision problems. Topics to be covered include: Simulation (Random Number Generation, Monte Carlo Methods, Statistical Analysis of Simulated Data, Variance Reduction, Rejection sampling, Bootstrap Methods, Markov Chain Monte Carlo); Fundamentals of Linear and Nonlinear Optimization (Linear Programming and the Simplex Algorithm, Duality, Nonlinear Programming, Formulation and computer implementation of large-scale mathematical programming models) and Stochastic Optimization (Metaheuristics, Random Search, Simulated Annealing, Evolutionary and Genetic Algorithms, Tabu Search, Partition Algorithms).

Statistical and High Performance Computing (STA5075Z)

This course aims to provide student with a foundation in statical computing for data science.  The course is divided into three sections, namely Basic Programming, High Performance Computing and Simulation & Optimisation.  In the first section students will learn how to write computer programs to analyse data with the R Language and Environment for Statistical Computing.  Students will then be taught how to run jobs in parallel on a remote computer cluster using a Linux command prompt.  Finally the course will introduce students to the fundamental principles and uses of simulation and optimisation.

Supervised Learning (STA5067Z)

Supervised learning is a set of statistical modelling tools for predicting or estimating the relationships between predictor and target variables in complex data sets.  As part of the Masters in Data Science degree this course aims to familiarise students with the statistical methodology needed to analyse the relationships between predictor and target variables in big data.  The students should be able to apply the appropriate statical methods such as Generalized Linear Models, Tree-Based Methods, Multivariate Methods, Feature Extraction, Support Vector Machines and Neural Networks to analyse a big data set and estimate the relationships between the predictor and target variables.

Survival Analysis (STA5072Z)

This module will look at latest methods for the analysis of time to event data, including Censoring mechanisms (Type 1 right censoring, type 2 right censoring, interval censoring and left censoring),Survival likelihood,Kaplan-Meier method and its variance , Confidence interval for survival function, Hypothesis testing in nonparametric setting (logrank test, test for trend), Cox Proportional Hazards model (assumptions, model building, diagnostic techniques, checking proportional odds assumption),Parametric survival models in the proportional hazards metric (Exponential, Weibull), Parametric survival models in the accelerated failure time metric, The extended Cox model, interactions and time-varying covariates, Joint modelling/Multivariate/Clustered survival data, Frailty models

Unsupervised Learning (STA5077Z)

This course aims to familiarise students with the statistical methodology needed to analyse relationships between variables in big data without having causal relationships with predictor and response variables.  Topics covered include association rules and market basket analysis, self-organising maps, multidimensional scaling, cluster analysis, principal component analysis.