statistics

MVUE : Minimum-variance unbiased estimator

Can assume any value within one or more intervals (e.g. length of time between visits to a doctor)

These can be represented through:

The area under the curve (curve = frequency/density function) should sum to 1. is the area under the curve between a & b. Finding this area is often a use for calculus. You can do it on ti-89 by looking for the “Cumulative Distribution Function (CDF)“.

Uniform distribution

This means everything is equally likely. The curve is basically a rectangle.

Probability density function: Mean: Standard Deviation:

Normal Distribution

It’s a bell curve.

Density function: P(x<a) is determined from a table of probabilities. Note that the above is the same as z-score (zeta-score?).

“standard normal distribution” is a special case of the above formula, where

Seems most of the math for determining area under the curve takes into account that the full area under the curve = 1 and the items are symmetrical. From there, we can do simple logic to work out the area’s value b/c we can find the space from 0 to z using the tables.

Approximating binomial distribution with the Normal Distribution

You can approximate the binomial distribution (a discrete distribution) with the normal distribution (continuous).

You need to account for the “correction for continuity”, which menas that you add or subtract 0.5 to a discrete x value. So in a binomial distribution you might check when , in a normal distribution, you check .

This only works with sufficiently high n’s. and must hold true.

Another way to think about sufficiency is that (aka )must be contained in the discrete value’s range (0 to n) for it to be a fit.

From there, you calculate the z-score (, where the +.5 is the correction for continuity, then look up the probability in a table/using tech for that given z-score.

Exponential distribution

This is used to model things like the length of time or distance between occurrences of random events. It’s sometimes called the “waiting-time distribution”.

Density function: mean/stddev: .

To calculate probability,

Some useful python code for this.

from statistics import NormalDist
 
get_zscore = lambda x,mean,stddev: (x-mean)/stddev
prob_between_zscore = lambda z1,z2: NormalDist().cdf(z1) - NormalDist().cdf(z2)
 
 
# Getting a value from a percentile, mean and stddev
NormalDist().inv_cdf(.95) * stddev + mean

Sampling distribution of .

  1. The mean of the sampling distribution of equals the mean of the sampled population. .
  2. The standard deviation of the sampling distribution of equals the sampled populations std deviation divided by the square root of the sample size. . The standard deviation of the sample ( is often referred to as the “standard error of the mean”.

If the sample size in 2 above is large (5% of the population or more), must be multiplied by the “finite population correction factor” . In most sampling situations, this will be close to 1 and can be ignored.

Theorem:

  • If a random sample of observations is selected from a population with a normal distribution, the sampling distribution of will be a normal distribution.
  • Central Limit Theorem

Sampling distribution of the sample proportion

This is “how many voters are in favor of bill X?“. Binomial proportion known as . The sample proportion of ^^ is known as and is a good estimator of the population proportion .

  1. Mean of the sampling distribution is equal to the true binomial proportion, ; that is . Also, is an unbiased estimator of .
  2. Standard deviation of the sampling distribution is .
  3. For large samples, the sampling distribution is approximately normal. (Large = ).