multinomial : qualitative data with more than two outcomes
classes, categories, cells : names for the outcome of a multinomial experiment
one-way table : a table which summarizes the results of a multinomial experiment for a single qualitative variable
chi-square test : a test to compares the frequency distribution of categorical data against an expectation (see: chi-square procedure)
Properties of a multinomial experiment:
- There are identical trials.
- There are possible outcomes of each trial.
- The probabilities of the outcomes sum to 1 and remain the same from trial to trial.
- The trials are independant
- The random variables of interest are the “cell counts” of the number of observations that fall into each of the categories.
Test of hypothesis about multinomial probabilities: one-way table
where the etc are hypothesized values for the probabilities. : At least one of the probabilities doesn’t equal the hypothesis
Test stat: where is the expected cell count. Rejection region: where has degrees of freedom. p-value:
This requires the following conditions:
- multinomial experiment has been conducted
- the sample size will be large enough so that for every cell, will be equal to 5 or more.
Chi-Square test
To do this on the calculator, input the observed values vs theoretical (e.g. ) in two diffrent lists and reference them in the Chi-Square GOF (Goodness of Fit) function.
Test of independence
Tests whether one variable is independent from another (e.g. does your hogwarts house influence if you like pizza on pineapple).
gryffindor | hufflepuff | ravenclaw | slytherin | ||
---|---|---|---|---|---|
no | 79 | 122 | 204 | 74 | 479 |
yes | 82 | 130 | 240 | 69 | 521 |
161 | 252 | 444 | 143 | 1000 |
df = (2-1) * (4-1) = 3
from here, we calculate the probabilities of each of these, then run the results through the GoF function.
Example 11.6 from Introductory Statistics book
In a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend time with a disabled senior citizen. The program recruits among community college students, four-year college students, and nonstudents. In Table 11.15 is a sample of the adult volunteers and the number of hours they volunteer per week.
Is the number of hours volunteered independent of the type of volunteer?
H_0: hours worked are independent of type H_a: hours works are dependent on type
Actually observed:
Type | 1-3hr | 4-6 | 7-9 | total |
---|---|---|---|---|
community college | 111 | 96 | 48 | 255 |
four-year college students | 96 | 133 | 61 | 290 |
non-students | 91 | 150 | 53 | 294 |
total | 298 | 379 | 162 | 839 |
Expected results, based on taking row_total * column_total / overall_total:
Type | 1-3hr | 4-6 | 7-9 |
---|---|---|---|
community college | 255*298/839 | 255*379/839 | 255*162/839 |
four-year college students | 290*298/839 | 290*379/839 | 290*162/839 |
non-students | 294*298/839 | 294*379/839 | 294*162/839 |
which is:
Type | 1-3hr | 4-6 | 7-9 |
---|---|---|---|
community college | 90.572110 | 115.19070 | 49.237187 |
four-year college students | 103.00358 | 131.00119 | 55.995232 |
non-students | 104.42431 | 132.80810 | 56.767580 |
Then we check expected vs observed via Pearson Residuals:
Type | 1-3hr | 4-6 | 7-9 |
---|---|---|---|
community college | (90.572110-111)^2/111 | (115.19070-96)^2/96 | (49.237187-48)^2/48 |
four-year college students | (103.00358-96)^2/96 | (131.00119-133)^2/133 | (55.995232-61)^2/61 |
non-students | (104.42431-91)^2/91 | (132.80810-150)^2/150 | (56.767580-53)^2/53 |
which is:
Type | 1-3hr | 4-6 | 7-9 | |
---|---|---|---|---|
community college | 3.7594477 | 3.8362809 | 0.031888160 | |
four-year college students | 0.51093888 | 0.030039409 | 0.41061808 | |
non-students | 1.9803527 | 1.9704095 | 0.26782376 | |
6.2507393 | 5.8367298 | 0.71033 | 12.797799 |
Degrees of Freedom = 4 b/c (3col-1) * (3col-1) = 2*2 = 4 p-value = cdf(12.8, infinity, df=4) = .0123 a > p-value, so reject the null hypothesis
Test of homogeneity
Tests if two samples came from the same population
where is the number in the row, is the number for that column and is the number total.