# Statistics: Theory and Practice

## Overview

• Credit value: 30 credits at Level 6
• Assessment: a three-hour examination in the summer term (80%) and coursework (20%)

## Module description

This module will provide you with an overview of the main theoretical ideas that fundamentally underpin practices in day-to-day routine, or innovative, uses of the theory of statistics, and its applications. It follows on from the Probability and Statistics module to give a more in-depth understanding.

For those focusing on statistics, the module will provide some of the necessary prerequisite knowledge for modules found on final year undergraduate and Master's programmes in statistics. It can also serve as a more complete ‘stopping-off’ point for mathematicians who wish to push their statistical knowledge beyond an introductory level.

On the computational side, by studying this module you will gain a working knowledge of a high-level statistical programming language, such as R.

Teaching for this module will take place throughout the year, with eight evenings in each of the autumn and spring terms and two evenings of revision and consolidation in the summer term.

### Indicative module content

#### Probability and distribution theory

• Probability spaces, review of conditional probability and independence
• Discrete and continuous random variables and their moments
• Functions of random variables, with emphasis on generating functions
• Collections of random variables, conditional distributions and expectation
• The multivariate normal distribution, with emphasis on the bivariate normal

#### Introduction to statistical inference

• Point and interval estimation (with examples relating to the normal distribution)
• Introduction to hypothesis testing (with examples relating to the normal distribution)
• Likelihood and sufficiency, the Factorization Theorem
• Maximum likelihood estimators

#### Completely randomized one-way design

• Introduction to R
• Design and analysis of completely randomized one-way design (theory and practice in R)
• The chi-square and F distributions, and their relationship to analysis of variance techniques
• Least squares estimators
• Estimation and comparison of treatment effects
• Analysis of residuals

#### Linear regression

• Simple linear regression, analysis of residuals and prediction
• Multiple linear regression, ANOVA, testing redundancy
• Stepwise regression
• Modelling linear regression using R

## Learning objectives

By the end of this module, you will be able to:

• set up and carry out (or supervise the statistical implementation of) a simple designed experiment which allows for the testing of the influence of certain factors using ANOVA techniques
• collate and analyse data arising from a simple designed experiment within a package (like R), and draw appropriate conclusions
• specify and recognise the joint distribution of several random variables given appropriate assumptions on the marginal distributions and their dependence structure
• specify and recognise the multivariate normal distribution, and some of its important properties, particularly in relation to specific graphical properties of the bivariate normal distribution
• derive key results pertaining to the Chi-squared and Fisher distributions, and relate these to the theoretical basis for the ANOVA technique
• formulate and derive maximum likelihood estimators (and appreciate how these differ from those based on the method of moments)
• determine whether a statistic is sufficient for a given parameter
• appreciate the theoretical underpinning behind hypothesis testing and have an acknowledgment of how hypothesis tests are carried out across several different paradigms
• determine whether a given data set is amenable to analysis using multiple linear regression
• import or enter data into a statistical package, like R, and perform multiple linear regression by principally using command line functions (rather than menu-driven GUI operations)
• interpret and draw conclusions from a statistical analysis, and present these conclusions so that they can  either i) be well understood by a statistician, or ii) be accessible (in a non-misleading way) to the intelligent lay-person/non-statistician (who may be involved in policy development).