correlation coefficients

Correlation coefficients cannot be mathed. A +.80 coefficient isn’t twice as strong as +.40.

They can be thought of through:

Consistency: the relative degree of consistetency with which Ys are paired with Xs. At +1, everyone who had a given X got the same Y for all Xs.
Variability: There is no variability within any given X at $\pm 1$ .
The scatterplot: How closely the correlation coefficient matches the linear regression line.
Predictions: Communicates the relative accuracy of our predictions

Definitions

variability : the opposite of correlation

correlation coefficient : the numerical representation of a relationship between two things. It’s always rounded to 2 decimals. In real experiments, $\pm .30$ is considered weak and $\pm .50$ is considered extremely strong.

regression line : a line through a scatter plot indicating the average.

scatterplot : a graph of individual data points from a set of x-y pairs. The X value in a scatterplot is the “given”. e.g. “Given cups of coffee consumed.. how nervous are people?”

strength of a relationship : (aka degree of association) how much/consistently one value of Y is associated with one and only one value of X. Consistency ranges from either 0-1 (max = 1) or -1 to +1 (if signed).

perfect correlation : a correlation coefficient of -1 or +1.variability

pearson correlation coefficient : represented as $r$ . Describes the relationship between two interval or ratio variables.

restricted range : Seems like it means a minimum breadth/range of scores required

sampling distribution of r : if you took an infinite number of samples from the population, computing $r$ . It produces a normal distribution.

Correlational Analysis

The four main differences as compared to an experiment.

In an experiment you look at ‘x’ and ‘y’ and then ‘x and y’. For correlational analysis, you look at the ‘x-y pairs’ only.
b/c we look at all the pairs, correlational analysis is single-sample. N=number of pairs.
X is determined by the question. “Given an amount of coffee, how nervous are people?” would mean that X = amount of coffee.
Data is graphed on a scatterplot.

Types of relationships

“As x changes, the y’s…”

linear: it follows a single straight line. It can go up (positive linear relationship) or down (negative linear relationship)
nonlinear: aka curvilinear, like “given age, how fast are you?” would be a U-shaped relationship.

Pearson correlation coefficient

This determines the “average” amount that the X and Y scores correspond. We translate each X and Y into their z-score ( $z_{x}$ and $z_{y}$ ) and compare them.

Requires:

X and Y scores each form an approximately normal distribution
Avoid the restricted range of X or Y.

At a high level, we compare the

Janky “defining formula” that yields too many rounding errors.

\begin{math} r = \frac{\Sigma(z_x z_y)}{N} \end{math}

To compute r, we instead use:

\begin{math} r = \frac{N(\Sigma XY) - (\Sigma X)(\Sigma Y)}{\sqrt{[N(\Sigma X^2)-(\Sigma X)^2] [N(\Sigma Y^2)-(\Sigma Y)^2]}} \end{math}

It’s worth noting that $Σ X Y$ is the “sum of the cross products”, e.g. sum all of (x*y)

Example

X = Glasses of Juice per Day Y = Doctor Visits per Year

Participant	X	X2	Y	Y2	XY
1	0	0	8	64	0
2	0	0	7	49	0
3	1	1	7	49	7
4	1	1	6	36	6
5	1	1	5	25	5
6	2	4	4	16	8
7	2	4	4	16	8
8	3	9	4	16	12
9	3	9	2	4	6
10	4	16	0	0	0
n=10	Σ X = 17	Σ X^2 = 45	\SigmaY = 47	Σ Y^2 = 275	Σ XY = 52
	(Σ X)^2 = 289		(Σ Y)^2 = 2209

\begin{align} r &= \frac{N(\Sigma XY) - (\Sigma X)(\Sigma Y)}{\sqrt{[N(\Sigma X^2)-(\Sigma X)^2] [N(\Sigma Y^2)-(\Sigma Y)^2]}} \\ &= \frac{10(52) - (17)(47)}{\sqrt{[10(45)-289] [10(275)-2209]}} \\ &= \frac{520 - 799}{\sqrt{[450-289] [2750-2209]}} \\ &= \frac{-279}{\sqrt{[161] [541]}} \\ &= \frac{-279}{\sqrt{87101}} \\ &= \frac{-279}{295.129} \\ &= -.95 \end{align}

Significance testing

df = $N - 2$ .

Steps:

Compute $r_{o b t}$
Define $H_{0}$ and $H_{a}$ as one or two tailed
find $r_{cr i t}$ using df.
If $r_{o b t}$ is beyond $r_{cr i t}$ , then it’s significant.

The notes of Justin Abrahms

Recently updated

tests for quartz

Zero Knowledge Proofs (ZKP)

Sprint Ceremony input/outputs

Explorer