Unit 1: Introduction to Statistics
Statistics is the science of collecting, organising, summarising and analysing data to draw conclusions from it. It can be divided in two branches:
- Descriptive statistics, which describes, resumes and organises data.
- Inferential statistics. which draw conclusions from said data.
As with many disciplines, the scientific method can be easily applied to statistics.
Measures of Central Tendency
A measure of central tendency is a value in a set of data that can be described as being in the centre of the (sorted) data. The most common measures of central tendency are the arithmetic mean, the weighted mean, the median and the mode.
In the case of the arithmetic mean, it is also possible to calculate it when we have a table of frequencies or groups of data with class marks. Similarly, we can calculate the median for a table of frequencies.
Measures of Position
A quantile is a value that serves as a measure of position. They can be seen as the value up to which a random variable accumulates a given density. Percentiles, quartiles and deciles are specific cases of quantiles.
Measures of Dispersion
Measures of Shape
Measures of Association
Statistical data can be represented graphically in many types of graphs, which include, but are not limited to:
- Bar charts, where classes of a variable are associated with a frequency bar.
- Histograms, where values are ordered.
- Dispersion graphs, which show the relation between two variables.
- Bubble graphs, which show the relation between three variales.
- Time series, which show the evolution of a variable across time.
- Box plots, which summarise a series of numerical data with the help of quartiles.
Unit 2: Methods for Obtaining Functions of Random Variables
(Not so Brief) Probability Theory Review
A brief review on probability theory was seen, which included the concepts of random variable (discrete and continuous), density function, distribution function, random vector, and moment-generating function, to name just a few.
I created no new notes for these concepts, since I had already seen all of them during my Probabilidad course (🏫). I did improve and edit some of them, however.
When defining a random variable in terms of another, we can find its density and distribution function by using one of two methods.
Collections of iid Random Variables
We say that a collection of random variables is iid when all of them are mutually independent and have the same probability distribution.
When ordering the random variables in a random sample according to their realisations, we can define the concept of order statistics.
More Probability Distributions
When having a collection of random variables that follow a standard normal distribution, the random variable that results from adding their squares up follows what we call a chi-squared distribution (denoted, of course, with the Greek letter chi ).
Student’s t-distribution describes the distribution of a random variable that results from operating a normally distributed random variable with a chi-squared distributed one.
Similarly, when operating two random variables that follow a chi-squared distribution, the resulting random variable has an F distribution.
Unit 3: Sampling Distributions
Random Samples, Sample Statistics and Parameters
A sample statistic (”estadístico” in Spanish) is a random variable that results from passing a random sample through a function. The corresponding distributions of such random variables are called sampling distributions.
A parameter is a numerical characterisation of a population that describes partially or completely its probability distribution. The set of all possible values that a parameter can have is called its parameter space.
Expected Value, Variance and Moment-Generating Function Theorems
When defining a random variable as the sum of the random variables in a random sample multiplied by some scalar each, its moment-generating function can be easily determined.
When we know the expected value and variance of the random variables of our random sample, then obtaining the expected value and variance of the mean of our random sample is easy.
Central Limit Theorem
The central limit theorem is an important theorem that allows us to use a standard normal distribution to approximate a sum of iid random variables with any distribution.
Sample Variance Theorems
Given a normally distributed random sample, we can use a chi-squared distribution to work with its variance. We can do something similar even when we don’t know the expected value of the random variables, albeit with a slightly different sample variance.
Unit 4: Point and Interval Estimation
Given a random sample characterised by a parameter, an estimator is a sample statistic that aims to estimate (i.e. give an approximation of) such a parameter.
An estimator can be:
- Unbiased when its expected value is equal to the parameter it estimates.
- Asymptotically unbiased (in the case of a sequence of estimators) when, as the size of the sequence increases, the expected values converge to the estimated parameter.
- More efficient than others when their variance is smaller.
- Consistent (in the case of a sequence of estimators) when, as the size of the sequence increases, they converge to the estimated parameter.
We can use several methods to obtain estimators for a given parameter:
- The moments method consists of equating the population moments with their corresponding sample moments, and then solving the resulting equation(s).
- The maximum likelihood method consists of maximising the random sample’s likelihood function.
- As its name suggests, this method uses a random sample’s likelihood function, which is the product of the density functions of each random variable in the sample.
Cramér-Rao Lower Bound
Given an estimator , the Cramér-Rao lower bound provides a lower bound for .
An estimator is said to be a uniformly minimum-variance unbiased (UMVUE) estimator when (1) it’s unbiased and (2) its Cramér-Rae lower bound is equal to .