chi-square procedure

$χ^{2}$ pronounced “kigh square”

Used when measuring a nominal variable. We deal with the frequency of individuals in a caregory (like “republicans” or “answered yes”). It allows us to answer questions like “40% of people said yes.. but does this reflect what we’d expect from the underlying population?”

There is no limit on the number of categories we can test. When we have one variable, we use the one-way chi square procedure and two variables means two-way.

When reported in a paper, it looks like $χ^{2} (1, N = 50) = 18.00, p < .05$ which says “df=1, N=50 we found it significant with a $χ_{o b t}^{2}$ of 18.00.

Definitions

observed frequency : $f_{o}$ . The frequency that participants fall into a given category

expected frequency : $f_{e}$ . Frequency we expected in a category if the sample data perfectly represented the distribution of the population described by $H_{0}$ (no difference) aka $\frac{N}{k}$ .

one-way chi square procedure

There are 5 assumptions of this test.

Participants are categorized along one variable having 2+ categories and we can count the frequency in those categories.
Each participant can only be in one category.
Category membership is indepenent (the fact that someone is in one category does not influence the probability that another participant will be in any category)
We include the responses of all participants in the study
The “expected frequencies” must be at least 5 per category.

There’s no standard way to state the null hypothesis, so we say:

$H_{0} : all frequencies in the population are equal$ and $H_{a} : not all frequencies in the population are equal$

Formulas:

χ_{o b t}^{2} df = Σ (\frac{( f _{o} - f _{e} ) ^{2}}{f _{e}}) = k - 1

Unlike ANOVA, there is no post-hoc testing nor effect size to calculate. Merely documentation of what the observed frequency is.

example

We’re testing whether geniuses are more often left-handed than the population.

In the population, we know that left-handedness happens 10% of the time.

Looking at geniuses, we see a 20% rate.

left handed	right handed
f_o = 10	f_o = 40
f_e = 5	f_e = 45

χ_{o b t}^{2} = Σ (\frac{( f _{o} - f _{e} ) ^{2}}{f _{e}}) = \frac{( 10 - 5 ) ^{2}}{5} + \frac{( 40 - 45 ) ^{2}}{45} = \frac{( 5 ) ^{2}}{5} + \frac{( - 5 ) ^{2}}{45} = \frac{25}{5} + \frac{25}{45} = 5 + .5556 = 5.5556

Because $df = 1$ and $α = .05$ , $χ_{cr i t}^{2} = 3.84$ so this is a significant difference.

two-way chi square procedure

Tests whether or not a variable is independent of (e.g. unrelated to) another category. This is like interaction testing in two-way ANOVA.

Used when there are two variables like this. Note that it doesn’t have to be a 2x2 design, but could be 4x7. As long as there are only two factors.

	type A personality	type B
heart attack
no heart attack

A dataset of complete independence would be:

	type A personality	type B
heart attack	20	20
no heart attack	20	20

Total dependence would be:

	type A personality	type B
heart attack	40	0
no heart attack	0	40

$H_{0}$ : Category membership is indepentend of category membership in the other. $H_{a}$ : The category membership is dependent.

In this setup, we can find the expected frequency via:

f_{e} df = \frac{( Cell’s row total of f _{o} ) ( Cell’s column total of f _{o} )}{N} = (num rows - 1) (num cols - 1)

example

	type A personality	type B
heart attack	25	10
no heart attack	5	40

Calculate $f_{e}$ for each cell:

	type A personality	type B
heart attack	(30*35)/80 = 13.125	(50*35)/80 = 21.875
no heart attack	(45*30)/80 = 16.875	(50*45)/80 = 28.125

Calculate $χ_{o b t}^{2}$ .

χ_{o b t}^{2} = Σ (\frac{( f _{o} - f _{e} ) ^{2}}{f _{e}}) = \frac{( 25 - 13.125 ) ^{2}}{13.125} + \frac{( 10 - 21.875 ) ^{2}}{21.875} + \frac{( 5 - 16.875 ) ^{2}}{16.875} + \frac{( 40 - 28.125 ) ^{2}}{28.125} = \frac{( 11.875 ) ^{2}}{13.125} + \frac{( - 11.875 ) ^{2}}{21.875} + \frac{( - 11.875 ) ^{2}}{16.875} + \frac{( 11.875 ) ^{2}}{28.125} = \frac{141.016}{13.125} + \frac{141.016}{21.875} + \frac{141.016}{16.875} + \frac{141.016}{28.125} = 10.75 + 6.45 + 8.36 + 5.01 = 30.56

$df = (2 - 1) (2 - 1) = 1 \cdot 1 = 1$

$χ_{cr i t}^{2} = 3.84$ so this result is significant and we can reject $H_{0}$

Post-hoc testing

We can post-hoc test two-way chi square.

phi coefficient

$ϕ$ . A post-hoc test for 2x2 chi-square to deteremine how independent it is. 0 is complete independence. 1 is total dependence. Real research tends to be in the .2 - .5 range.

ϕ = \frac{χ _{o b t}^{2}}{N}

Given the example above, $ϕ = \frac{χ _{o b t}^{2}}{N} = \frac{30.56}{80} = .382 = .62$ , so this is a pretty strong effect.

Squaring a correlation coefficient gives us the “proporiton of variance accounted for” (a la Effect size), so $ϕ^{2} = .38$ says 38% of the differences is associated with their personality type.

contingency coefficeint

$C$ . This is like the phi-coefficient for things that aren’t 2x2 (e.g. 2x3 or 3x3 etc).

C = \frac{χ _{o b t}^{2}}{N + χ _{o b t}^{2}}

The notes of Justin Abrahms

Recently updated

tests for quartz

Zero Knowledge Proofs (ZKP)

Sprint Ceremony input/outputs

Explorer

chi-square procedure

Definitions

one-way chi square procedure

example

two-way chi square procedure

example

Post-hoc testing

phi coefficient

contingency coefficeint

Graph View

Table of Contents

Backlinks