Correlation coefficients cannot be mathed. A +.80 coefficient isn’t twice as strong as +.40.
They can be thought of through:
- Consistency: the relative degree of consistetency with which Ys are paired with Xs. At +1, everyone who had a given X got the same Y for all Xs.
- Variability: There is no variability within any given X at .
- The scatterplot: How closely the correlation coefficient matches the linear regression line.
- Predictions: Communicates the relative accuracy of our predictions
Definitions
variability : the opposite of correlation
correlation coefficient : the numerical representation of a relationship between two things. It’s always rounded to 2 decimals. In real experiments, is considered weak and is considered extremely strong.
regression line : a line through a scatter plot indicating the average.
scatterplot : a graph of individual data points from a set of x-y pairs. The X value in a scatterplot is the “given”. e.g. “Given cups of coffee consumed.. how nervous are people?”
strength of a relationship : (aka degree of association) how much/consistently one value of Y is associated with one and only one value of X. Consistency ranges from either 0-1 (max = 1) or -1 to +1 (if signed).
perfect correlation : a correlation coefficient of -1 or +1.variability
pearson correlation coefficient : represented as . Describes the relationship between two interval or ratio variables.
restricted range : Seems like it means a minimum breadth/range of scores required
sampling distribution of r : if you took an infinite number of samples from the population, computing . It produces a normal distribution.
Correlational Analysis
The four main differences as compared to an experiment.
- In an experiment you look at ‘x’ and ‘y’ and then ‘x and y’. For correlational analysis, you look at the ‘x-y pairs’ only.
- b/c we look at all the pairs, correlational analysis is single-sample. N=number of pairs.
- X is determined by the question. “Given an amount of coffee, how nervous are people?” would mean that X = amount of coffee.
- Data is graphed on a scatterplot.
Types of relationships
“As x changes, the y’s…”
- linear: it follows a single straight line. It can go up (positive linear relationship) or down (negative linear relationship)
- nonlinear: aka curvilinear, like “given age, how fast are you?” would be a U-shaped relationship.
Pearson correlation coefficient
This determines the “average” amount that the X and Y scores correspond. We translate each X and Y into their z-score ( and ) and compare them.
Requires:
- X and Y scores each form an approximately normal distribution
- Avoid the restricted range of X or Y.
At a high level, we compare the
Janky “defining formula” that yields too many rounding errors.
\begin{math} r = \frac{\Sigma(z_x z_y)}{N} \end{math}
To compute r, we instead use:
\begin{math} r = \frac{N(\Sigma XY) - (\Sigma X)(\Sigma Y)}{\sqrt{[N(\Sigma X^2)-(\Sigma X)^2] [N(\Sigma Y^2)-(\Sigma Y)^2]}} \end{math}
It’s worth noting that is the “sum of the cross products”, e.g. sum all of (x*y)
Example
X = Glasses of Juice per Day Y = Doctor Visits per Year
Participant | X | X2 | Y | Y2 | XY |
---|---|---|---|---|---|
1 | 0 | 0 | 8 | 64 | 0 |
2 | 0 | 0 | 7 | 49 | 0 |
3 | 1 | 1 | 7 | 49 | 7 |
4 | 1 | 1 | 6 | 36 | 6 |
5 | 1 | 1 | 5 | 25 | 5 |
6 | 2 | 4 | 4 | 16 | 8 |
7 | 2 | 4 | 4 | 16 | 8 |
8 | 3 | 9 | 4 | 16 | 12 |
9 | 3 | 9 | 2 | 4 | 6 |
10 | 4 | 16 | 0 | 0 | 0 |
n=10 | Σ X = 17 | Σ X^2 = 45 | \SigmaY = 47 | Σ Y^2 = 275 | Σ XY = 52 |
(Σ X)^2 = 289 | (Σ Y)^2 = 2209 |
\begin{align} r &= \frac{N(\Sigma XY) - (\Sigma X)(\Sigma Y)}{\sqrt{[N(\Sigma X^2)-(\Sigma X)^2] [N(\Sigma Y^2)-(\Sigma Y)^2]}} \\ &= \frac{10(52) - (17)(47)}{\sqrt{[10(45)-289] [10(275)-2209]}} \\ &= \frac{520 - 799}{\sqrt{[450-289] [2750-2209]}} \\ &= \frac{-279}{\sqrt{[161] [541]}} \\ &= \frac{-279}{\sqrt{87101}} \\ &= \frac{-279}{295.129} \\ &= -.95 \end{align}
Significance testing
df = .
Steps:
- Compute
- Define and as one or two tailed
- find using df.
- If is beyond , then it’s significant.