University Of Cape Town Data Science

University Of Cape Town Data Science

University Of Cape Town Data Science, Statistics is the scientific application of mathematical principles to the collection, analysis, and presentation of data. Statisticians contribute to scientific enquiry by applying their mathematical and statistical knowledge to the design of surveys and experiments; the collection, processing, and analysis of data; and the interpretation of the results.

Who would be interested in studying statistics?

Statistics is a mathematical science, and so a taste and aptitude for mathematical thinking is a crucial ingredient. The field of statistics, like other areas of applied mathematics, often attracts those interested in the analysis of patterns in data: developing, understanding, abstracting, and packaging analytical methods for general use in other subject areas. Statistics is also, by definition, an information science. Imaginative use of both computing power and new computing environments drives much current research – so an interest in computation and/or computer science can also be a start for a statistician.

Career opportunities for graduates

One advantage of working in statistics is that you can combine your interest with almost any other field in science, technology, or business. Statisticians are employed in many industries, including: biology, finance, economics, engineering, medicine, public health, psychology, marketing, education and sports. In all of these areas and many others, statisticians work closely with other scientists and researchers to develop new statistical techniques, adapt existing techniques, design experiments, and direct analyses of surveys and retrospective studies.

 Statistics for Mathematical Disciplines

The aim of STA1006S is to provide students who intend to major in Mathematical Statistics with a solid foundation in the mathematical aspects of statistics required in the training of a professional statistician. The material for STA1006S places more emphasis on the theoretical and mathematical aspects of Statistics than STA1000S. As a result, the course will be taught at a much faster pace than STA1000S. The breadth and depth of the STA1006S syllabus means that the course demands from the students a hard working attitude and an effective study strategy.

Statistical Theory and Inference

STA2004F is a rigorous introduction to the foundations of mathematical statistics and aims to provide students with a deeper understanding of the statistical concepts covered in STA1006S. The course is intended for students studying mathematical or actuarial science. STA2004F is divided into two broad sections: (1) Probability and Distribution Theory and (2) Statistical Inference. During the first part of the course, students will learn to derive the distributions of random variables and their transformations, and explore the limiting behaviour of sequences of random variables. The last part of the course covers the estimation of population parameters and hypothesis testing based on a sample of data.

Linear Models

STA2005S consists of three sections. The first five weeks focus on the general linear model, regression theory as well as the practical application of multiple regression using R. The second five weeks cover the design and analysis of experiments (completely randomized, randomised block and latin square designs, factorial experiments, and we briefly introduce random effects). The last two weeks cover basic non-parametric statistics. The course covers the theory but there is also a strong emphasis on applying the theory to data and the analysis of data using statistical software.

Markov Processes and Time Series

STA3041F comprises two distinct sections. The first six weeks focus on Stochastic Processes; an introduction to discrete Markov chains, followed by Branching Processes, Counting of Events and Ruin Theory. In the second part of the course different methods for the analysis of Time Series are presented. These include AR processes, MA processes, ARIMA processes and a brief introduction to Garch modelling.

Decision Theory and Generalized Linear Models

This third year second semester course consists of two sections:

  1. Decision and Risk Theory covers the structure of decision making under uncertainty; game theory and non-probabilistic decision criteria; probabilistic decision criteria, expected value and utility; use of Bayes’ theorem; value of information; Bayesian statistical analysis for Bernoulli and normal sampling; empirical Bayes and credibility theory; loss and extreme value distributions; and the Monte Carlo method.
  2. Generalized Linear Models introduces the exponential family of distributions and covers the definition of a GLM, estimation and inference of GLMs, applications of GLMs to insurance and other data, including logistic, Poisson and log-linear models as well as models for continuous responses with skew distributions.

 Advanced Stochastic Processes

The course covers the analysis and modelling of stochastic processes. Topics include Poisson processes, Markov chains, random walks, measure theoretic probability, martingales, stopping theorems, Brownian motion, stochastic integration and Ito calculus.

Applied Statistics Stream

Statistics 1000

This course provides an introduction to the study of Statistics and explores some of the foundations of the discipline including exploratory data analysis, probability and probability distributions, statistical inference, tests of association and regression. Tutorials are split between classroom sessions which focus on solution of exam type problems, and computer lab sessions in which Excel is used as a platform both to explore statistical theory and to perform statistical calculations.

Statistics 1001

The objective of this course is to introduce first year students to the basic concepts of linear algebra and differential calculus. The course simplifies these concepts by covering a vast range of real life applications such as the rate of change and finding optimum solutions to linear programming problems. This course is primarily intended for EBE and Humanities students. The course outline includes Linear Algebra, differentiation, logarithmic and exponential functions, applications of differentiation, integration, linear programming and compound interest.

 Bionumeracy

This course provides an introduction to the study of Statistics within a biological context and explores some of the foundations of the discipline including exploratory data analysis, probability and probability distributions, statistical inference and regression. Practical data analysis skills are taught in lab sessions that use Excel as a platform, and students will learn how to apply the statistical theory being covered in lectures to real data sets. These skills will be important and relevant to students when they need to analyse data for research projects in other courses.

Statistics 1000 (Commerce Education Development Unit)

Exploratory data analysis and summary statistics. Probability theory. Random variables. Probability mass and density functions. Binomial, Poisson, exponential, normal and uniform distributions. Sampling distributions. Confidence intervals. Introduction to hypothesis testing. Tests on means, variances and proportions. Determining sample size. Simple linear regression and measures of correlation.

 Statistics 1001 (Commerce Education Development Unit)

Functions and graphs: straight lines, polynomials, exponential and logarithmic functions; Differential calculus; The Mathematics of Finance; Matrix algebra; Linear Programming; Binomial Theorem. Emphasis will be placed on areas of interest to Commerce students, including applications to Economics.

 Applied Statistical Modelling

The course aims to equip students with practical experience and skills in analysing data, using some statistical techniques frequently used in the sciences. The skills include designing experiments, choosing appropriate statistical methods for visual display and statistical modelling of data, model checking, interpretation and reporting of statistical results, and understanding limitations of statistical methods and data. By the end of the course the student should have gained enough confidence to transfer these skills to new problems or data sets in their own profession.

 Business Statistics

This course is designed to extend the student’s basic statistical knowledge, acquired in STA1000F/S. Applied techniques which have direct application in all the management functional areas such as Marketing, Finance, Production, Human Resource Management and Information Systems will be addressed. Students will be introduced to analysis of variance, simple and multiple regression, model building, time series analysis and non-parametric techniques. Students will continue to analyze data using Excel.

Theory of Statistics

This course explores some aspects of probability theory that are particularly relevant to statistics. Such aspects include the notions of random variable, joint probability distributions, expected values and moment generating functions, just to mention a few. The course also intends to familiarize students with statistical data analysis techniques such as the Chi-square test of independence and the Matched-pairs designs. The course outline includes univariate distributions and moments of univariate distributions, moments of bivariate distributions, distributions of sample statistics and regression analysis.

Research and Survey Statistics

STA3022F covers the application of multivariate statistical techniques. These have the aim of uncovering relationships between two or more variables. Students are exposed to a wide range of methods, including many of the most popular methods currently used in industry and general research. The focus of the course is on practical application and interpretation of results from these methods, rather than the underlying theory. Extensive use is made of examples and students are given practical training on applying the methods using statistical software.

Inferential Statistics

STA3030F provides a thorough introduction to the underlying principles of inferential statistics. Inference lies at the heart of statistical thinking. It provides a systematic approach for assessing how uncertainty introduced by sampling affects our ability to make meaningful statements about a range of phenomena. Since much of scientific and business research is based on experimenting with or observing a sample, statistical inference has become, to a large extent, the workhorse of modern research. The focus of the course is on providing students with a greater depth of understanding about standard inferential tools – confidence intervals, hypothesis testing, and parameter estimation – that were covered in earlier courses.

Operational Research Techniques

This course is an introduction to the study of Operations Research and explores some of the fundamental quantitative techniques in the Operations Research armamentarium. The course is intended for students in the applied statistics stream but may be taken as an elective by students in the mathematical statistics stream. The course is divided into four major sections: Mathematical programming (linear and non-linear programming to find optimal solutions to objectives subject to a series of constraints), Computer Simulation (mimicking the operation of real world systems as they evolve over time), Decision-making under uncertainty (exploration of decision rules and tools) and Forecasting using time series. The course is a very fun and practical one and exposes students in statistics to other practical applications of mathematics.

Postgraduate Programmes

Honours

We offer the following full-time honours programmes:

  • STA4006W BCom (Honours) in Statistical Sciences
  • STA4007W BSc (Honours) in Statistical Sciences
  • STA4010W BBusSc in Analytics

We also offer the following fourth-year level courses for students that are not majoring in statistics:

  • STA4016H Selected honours topics (one semester course load)
  • STA4011W Selected honours topics (whole year course load)

Further information on our honours programmes is available here:

  • Course Outline
  • Module Information

To be considered for admission to any of the above courses, students must obtain an average of 65% for their third-year level statistics courses on their first attempt. At UCT, this is STA3041F/STA3043S or STA3030F/STA3036S.

All students who wish to be considered for a place in the honours programme (including BBusSc students) must complete this form before the 31 October deadline. BCom (Hons) and BSc (Hons) students must also apply to the university.

Masters in Data Science

Masters in Data Science (STA5080W & AST5005H/IBS5005W/CSC5009H/PHY5008H/STA5079H)

This is an interdisciplinary programme with participating departments: Statistical Sciences, Computer Science, Astronomy, Physics, and the Computational Biology group (Health Sciences Faculty).  This programme is aimed at students who hold a good honours degree but who do not have advanced background in Statistics and Computer Science although they have been exposed to mathematics and computing during their undergraduate studies.  Students will learn the statistical and computing skills required to deal with Big Data from Astronomy, Physics, Medicine and Commerce.  This masters programme is composed of two equally weighted components.  STA5080W is the coursework component (90 credits), followed by a 50% dissertation (90 credits) on a selected research topic in one of the following: Data Science in Astronomy (AST5005H), Data Science in Bioinformatics (IBS5005W), Data Science in Computer Science (CSC5009H), Data Science in Physics (PHY5008H) or Data Science in Statistical Sciences (STA5079H).  The programme will be open to students with at least 65% for an honours degree in any discipline that involved a substantial component of quantitative and computing training, as assess by a selection committee made up of representatives from the contributing departments.  The successful completion of pre-courses as deemed necessary by the selection committee might be required (STA5014Z) before being allowed to register for the programme.  Students will be required to pass 5 compulsory and 2 elective modules.  The overall mark for the coursework component will be a weighted average (based on contribution towards total credit count) of the marks obtained for the individual modules.  Students will be required to pass each individual module in order to pass the coursework component of the programme. The degree will be awarded as a Master of Science specialising in Data Science.

Stream Structure

The structure of the General stream has more flexibility with the following compulsory core modules:

Databases for Data Scientists CSC5007Z 12 credits
Statistical and High Performance Computing STA5075Z 12 credits
Data Visualization CSC5008Z 12 credits
Unsupervised Learning STA5077Z 12 credits
Supervised Learning STA5076Z 18 credits

In order to complete 90 credits, students can choose from the following elective modules although not all modules will be offered every year; modules offered will depend on staff availability and the course will be tailored to the interests and needs of the particular students.

Data Science for Astronomy AST5004Z 12 credits
Data Science for Particle Physics PHY5007Z 12 credits
Bioinformatics for high-throughput biology IBS5004Z 15 credits
Data Science for Industry STA5073Z 12 credits
Decision Modelling for Prescriptive Analytics STA5074Z 12 credits
Bayesian Decision Modelling STA5061Z 15 credits
Data Analysis for High Frequency Trading STA5091Z 12 credits

Any other masters modules in Statistical Sciences or Computer Science. Specific entry requirements might apply to these modules. For more information about the general stream please contact Celene.Jansen-Fielies@uct.ac.za

Masters Programmes

The Department of Statistical Sciences offers four masters programmes:

Coursework Masters Degrees

  • Masters in Data Science (STA5080W & AST5005H/ IBS5004H/ CSC5009H/ PHY5008H/ STA5079H)
  • Masters in Advanced Analytics and Decision Sciences by course work and half dissertation (STA5003W & STA5004W)
  • Masters in Biostatistics (STA5057W & STA5058W)

Dissertation Masters Degrees

  • Masters in Mathematical Statistics by dissertation only (STA5000W)
  • Masters in Operational Research by dissertation only (STA5001W)
  • Masters in Ecological/Environmental Statistics by dissertation only (STA5013W)

Module Information

  • Detailed description of Masters modules

Application Process

  • Entrance requirements and application forms

Postgraduate Programmes

Doctoral Programmes

STA6001W: PhD in Statistical Sciences

The topic of the PhD degree is decided in conjunction with a supervisor. Although every effort will be made to link potential students with a supervisor in the field of the submitted research proposal, it remains the responsibility of the applicant to secure a commitment from a suitable supervisor. The research fields of our staff vary in the areas of Astrostatistics, Biostatistics and Bioinformatics, Ecological statistics, Econometrics and Financial modelling, Multivariate statistics, Decision modelling, Problem structuring and project management, Stochastic processes, Spatial statistics and Statistical Education. For more information on the specific specialisations of staff members, see the Academic staff page

Entrance requirements

A relevant Masters programme demonstrating research ability. Please note that the Department reserve the right to accept you for Masters rather than PhD registration. At the end of one year, your progress will be assessed by the departmental postgraduate committee. Your registration may then be upgraded to a PhD, remain as is for a MSc, or terminated, depending on progress or lack thereof.

Application procedure

Application to the department is facilitated by sending an e-mail containing the following to Ms Celene Jansen-Fielies (celene.jansen-fielies@uct.ac.za)

  • Completed expression of interest form
  • Full academic transcripts of all courses not completed at UCT
  • 2 page CV
  • A two-page research proposal

Students are welcome to initiate the application process at any time during the academic year, although registration usually takes place in February or July.

Once the department has indicated provisional acceptance into the PhD programme, official application is to the Science Faculty by completing the online application form www.uct.ac.za/apply/applications/forms

Financing

You need to ensure sufficient funds to cover your fees and living expenses. A limited number of university bursaries and other bursaries are available.

You need to apply separately for such funding (http://www.uct.ac.za/apply/funding/postgraduate/applications). A limited number of tutoring positions are available in the department. The salary would depend on your duties and typically provides not more than R1500 per month for eight or nine months of the year. Note that an offer/acceptance into a postgraduate programme does not automatically ensure or entitle you to a tutorship. The department does not offer any financial assistance to students and it is imperative that students ensure coverage of their own financial needs before they arrive at UCT.

Language requirements

The official language of the university is English. Students may be required to undertake an English proficiency test.

For more information on postgraduate studies (application procedure, funding and rules) of UCT please consult: http://www.uct.ac.za/apply/applications/postgraduates

Note that the department’s approval of your application is a requirement of registration, but the Faculty may have additional requirements.

Short Courses & Workshops

The Department of Statistical Sciences will be offering the following short courses and workshops in 2018:

  1. Workshop: Chain Event graphs in modelling complex health and medical data (3 – 6 April 2018)
  2. Short Course: Mathematical Modelling for Infectious Diseases (16 – 26 April 2018)
  3. Short Course: Data Science for Industry (23 July – 5 September 2018)