statistics

multinomial : qualitative data with more than two outcomes

classes, categories, cells : names for the outcome of a multinomial experiment

one-way table : a table which summarizes the results of a multinomial experiment for a single qualitative variable

chi-square test : a test to compares the frequency distribution of categorical data against an expectation (see: chi-square procedure)

Properties of a multinomial experiment:

  1. There are identical trials.
  2. There are possible outcomes of each trial.
  3. The probabilities of the outcomes sum to 1 and remain the same from trial to trial.
  4. The trials are independant
  5. The random variables of interest are the “cell counts” of the number of observations that fall into each of the categories.

Test of hypothesis about multinomial probabilities: one-way table

where the etc are hypothesized values for the probabilities. : At least one of the probabilities doesn’t equal the hypothesis

Test stat: where is the expected cell count. Rejection region: where has degrees of freedom. p-value:

This requires the following conditions:

  1. multinomial experiment has been conducted
  2. the sample size will be large enough so that for every cell, will be equal to 5 or more.

Chi-Square test

To do this on the calculator, input the observed values vs theoretical (e.g. ) in two diffrent lists and reference them in the Chi-Square GOF (Goodness of Fit) function.

Test of independence

Tests whether one variable is independent from another (e.g. does your hogwarts house influence if you like pizza on pineapple).

gryffindorhufflepuffravenclawslytherin
no7912220474479
yes8213024069521
1612524441431000

df = (2-1) * (4-1) = 3

from here, we calculate the probabilities of each of these, then run the results through the GoF function.

Example 11.6 from Introductory Statistics book

In a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend time with a disabled senior citizen. The program recruits among community college students, four-year college students, and nonstudents. In Table 11.15 is a sample of the adult volunteers and the number of hours they volunteer per week.

Is the number of hours volunteered independent of the type of volunteer?

H_0: hours worked are independent of type H_a: hours works are dependent on type

Actually observed:

Type1-3hr4-67-9total
community college1119648255
four-year college students9613361290
non-students9115053294
total298379162839

Expected results, based on taking row_total * column_total / overall_total:

Type1-3hr4-67-9
community college255*298/839255*379/839255*162/839
four-year college students290*298/839290*379/839290*162/839
non-students294*298/839294*379/839294*162/839

which is:

Type1-3hr4-67-9
community college90.572110115.1907049.237187
four-year college students103.00358131.0011955.995232
non-students104.42431132.8081056.767580

Then we check expected vs observed via Pearson Residuals:

Type1-3hr4-67-9
community college(90.572110-111)^2/111(115.19070-96)^2/96(49.237187-48)^2/48
four-year college students(103.00358-96)^2/96(131.00119-133)^2/133(55.995232-61)^2/61
non-students(104.42431-91)^2/91(132.80810-150)^2/150(56.767580-53)^2/53

which is:

Type1-3hr4-67-9
community college3.75944773.83628090.031888160
four-year college students0.510938880.0300394090.41061808
non-students1.98035271.97040950.26782376
6.25073935.83672980.7103312.797799

Degrees of Freedom = 4 b/c (3col-1) * (3col-1) = 2*2 = 4 p-value = cdf(12.8, infinity, df=4) = .0123 a > p-value, so reject the null hypothesis

Test of homogeneity

Tests if two samples came from the same population

where is the number in the row, is the number for that column and is the number total.