41 terms

ANOVA

Comparison of Means from

More Than 2 Groups

More Than 2 Groups

WHERE WE HAVE BEEN...

Compare mean obtained from one group to

predetermined number

1. One-mean hypothesis test

2. Can either be one-tailed (specify direction of

association) or two-tailed (no direction

specified)

predetermined number

1. One-mean hypothesis test

2. Can either be one-tailed (specify direction of

association) or two-tailed (no direction

specified)

WHERE WE HAVE BEEN...

Compare means from two groups

1. Two-means hypothesis test

2. Can either be one-tailed (specify direction of

association) or two-tailed (no direction

specified)

1. Two-means hypothesis test

2. Can either be one-tailed (specify direction of

association) or two-tailed (no direction

specified)

WHAT WE WERE REALLY DOING WHEN

COMPARING TWO MEANS

COMPARING TWO MEANS

Testing hypothesis about association between

two variables/Associations are between CATEGORICAL IV

(nominal or ordinal) with two categories and

CONTINUOUS DV

two variables/Associations are between CATEGORICAL IV

(nominal or ordinal) with two categories and

CONTINUOUS DV

WHERE WE ARE GOING...

What if we have more than 2 groups to compare?

/For example, what if we want to know if

happiness scores among people who are married,

divorced, widowed, OR never-married differ from

one another?

/ Cannot use z-tests or t-tests with more than 2

groups. So what do we use instead???

/For example, what if we want to know if

happiness scores among people who are married,

divorced, widowed, OR never-married differ from

one another?

/ Cannot use z-tests or t-tests with more than 2

groups. So what do we use instead???

ANOVA

Analysis of Variance (ANOVA)

/Allows us to test whether there is association

between CATEGORICAL IV (nominal or ordinal

level) that has more than 2 categories and

CONTINUOUS DV

/Allows us to test whether there is association

between CATEGORICAL IV (nominal or ordinal

level) that has more than 2 categories and

CONTINUOUS DV

stating hypothesis

null:no association/research:assocation

WHAT WOULD PERFECT ASSOCIATION

BETWEEN CITY TYPE AND MURDER RATES

LOOK LIKE?

BETWEEN CITY TYPE AND MURDER RATES

LOOK LIKE?

WITHIN every category of city type, all values

(ie. murder rates) would be same

/ BETWEEN categories of city type, mean murder

rates would be different

/ In other words, mean murder rates would be

different for each type of city BUT

/ All cities within each type (manufacturing, trade,

government) would have same murder rate

(ie. murder rates) would be same

/ BETWEEN categories of city type, mean murder

rates would be different

/ In other words, mean murder rates would be

different for each type of city BUT

/ All cities within each type (manufacturing, trade,

government) would have same murder rate

WHAT WOULD PERFECT ASSOCIATION

LOOK LIKE?

LOOK LIKE?

every row for respective column would have same number

WHAT WOULD ABSOLUTELY NO

ASSOCIATION LOOK LIKE?

ASSOCIATION LOOK LIKE?

no correlation between numbers in columns

WHAT REALITY GENERALLY LOOKS

LIKE

LIKE

Of course, we never have perfect association (or

absolutely no association) between two variables

in social science

/BUT when we have STRONG association, most

of variation occurs BETWEEN categories/ Means that independent variable (city type)

explains most of variation in dependent variable

(murder rate)

absolutely no association) between two variables

in social science

/BUT when we have STRONG association, most

of variation occurs BETWEEN categories/ Means that independent variable (city type)

explains most of variation in dependent variable

(murder rate)

PARTITIONING VARIANCE

How much single observation deviates from

grand mean/Mathematically we can divide total

deviation for given observation (xik) into

1. Extent to which xik differs from group mean k

(ie. difference WITHIN category)

2. Extent to which group mean k differs from

grand mean (ie. difference BETWEEN

categories)

grand mean/Mathematically we can divide total

deviation for given observation (xik) into

1. Extent to which xik differs from group mean k

(ie. difference WITHIN category)

2. Extent to which group mean k differs from

grand mean (ie. difference BETWEEN

categories)

PARTITIONING VARIANCE FOR SINGLE

OBSERVATION

OBSERVATION

We are interested in doing this for every observation in data

set

/ Dividing TOTAL variation across all observations into

variation BETWEEN categories & variation WITHIN

categories

/ We will use same methods to partition variance but this time

do it for all observations in data set

/ So we use sum of squares again

set

/ Dividing TOTAL variation across all observations into

variation BETWEEN categories & variation WITHIN

categories

/ We will use same methods to partition variance but this time

do it for all observations in data set

/ So we use sum of squares again

IN ENGLISH, PLEASE

WITHIN GROUP sum of squares is sum of

squared deviation of every raw score from its

group mean

1. You are figuring out extent to which each raw

score (xik) deviates from group mean (k)

2. Squaring these deviation scores to get rid of

negative signs

3. Then adding up squared deviations from each

observation within given group

squared deviation of every raw score from its

group mean

1. You are figuring out extent to which each raw

score (xik) deviates from group mean (k)

2. Squaring these deviation scores to get rid of

negative signs

3. Then adding up squared deviations from each

observation within given group

MORE ENGLISH, PLEASE

In contrast, BETWEEN GROUP sum of squares

is sum of squared deviation of every group mean

from grand mean

1. You are figuring out extent to which each group

mean (k) deviates from grand mean (total)

2. Squaring these deviation scores to get rid of

negative signs

3. Then adding up squared deviations for all

groups

is sum of squared deviation of every group mean

from grand mean

1. You are figuring out extent to which each group

mean (k) deviates from grand mean (total)

2. Squaring these deviation scores to get rid of

negative signs

3. Then adding up squared deviations for all

groups

TESTING FOR ASSOCIATION WITH

ANOVA

ANOVA

How do we use this information to determine if

there is association between IV (type of city) &

DV (murder rates)?

If most of TOTAL VARIATION (SStotal) can be

attributed to variation WITHIN categories of IV

Then there is NO ASSOCIATION between IV

and DV

there is association between IV (type of city) &

DV (murder rates)?

If most of TOTAL VARIATION (SStotal) can be

attributed to variation WITHIN categories of IV

Then there is NO ASSOCIATION between IV

and DV

TESTING FOR ASSOCIATION WITH

ANOVA

ANOVA

How do we use this information to determine if

there is association between IV (type of city) &

DV (murder rates)?

/ If most of TOTAL VARIATION (SStotal) can be

attributed to variation BETWEEN categories of

IV

/ Then there is SIGNIFICANT ASSOCIATION

between IV and DV

there is association between IV (type of city) &

DV (murder rates)?

/ If most of TOTAL VARIATION (SStotal) can be

attributed to variation BETWEEN categories of

IV

/ Then there is SIGNIFICANT ASSOCIATION

between IV and DV

step 1

CALCULATE MEAN FOR EACH

GROUP

GROUP

step 2

CALCULATE WITHIN GROUP

SUM OF SQUARES

SUM OF SQUARES

step 3

calculate between group sum of squares

STEP #4

CALCULATE DEGREES OF

FREEDOM (BETWEEN & WITHIN)

FREEDOM (BETWEEN & WITHIN)

step 4

df between = k-1

where k is number of categories in IV/df within = n-k

where n is number of cases & k is number

of categories in IV

where k is number of categories in IV/df within = n-k

where n is number of cases & k is number

of categories in IV

STEP #5

CALCULATE MEAN SQUARES

(BETWEEN & WITHIN)

(BETWEEN & WITHIN)

step 5

Transform sums of squares (which are

measures of variation) into measures of

VARIANCE

/Measures of VARIANCE (mean squares)

differ from gross measures of VARIATION

(sums of squares) because...

/VARIANCE (mean squares) takes into

account degrees of freedom (ie. sample

size & number of groups in IV)

measures of variation) into measures of

VARIANCE

/Measures of VARIANCE (mean squares)

differ from gross measures of VARIATION

(sums of squares) because...

/VARIANCE (mean squares) takes into

account degrees of freedom (ie. sample

size & number of groups in IV)

mean squares between

MS between = SSbetween / df between

mean squares within

MS within = SSwithin / df within

STEP #6

CONDUCT HYPOTHESIS TEST/Follow same 5 steps we have been using for

hypothesis testing

1. State null & alternative hypotheses

2. Determine alpha level

3. Determine critical value of F

4. Compute test statistic (in this case, use F test)

5. Compare obseved F to critical F & state

conclusion

hypothesis testing

1. State null & alternative hypotheses

2. Determine alpha level

3. Determine critical value of F

4. Compute test statistic (in this case, use F test)

5. Compare obseved F to critical F & state

conclusion

STEP #1

state hypothesis

step 2

determine alpha level

step 3

find critical f/If df fall between two listed values, use SMALLER df/If df is greater than largest listed value (> 20 in

numerator or > 1000 in denominator), use infinity for

that component

numerator or > 1000 in denominator), use infinity for

that component

step 4

calculate observed f/HIGHER ratio is, more variance that can be attributed to

differences BETWEEN categories

/ LOWER ratio is, more variance that can be attributed to

differences WITHIN categories

differences BETWEEN categories

/ LOWER ratio is, more variance that can be attributed to

differences WITHIN categories

f observed=

MS between/ MS within

step 5

compare critical f to observed f

step 5

Remember, with ANOVA we are testing whether

between group variance is greater than within

group variance

/ We want to know whether observed value of F is

relatively large

/If observed F is greater than critical F, we will

reject H0 and conclude there is an association

between independent & dependent variables

between group variance is greater than within

group variance

/ We want to know whether observed value of F is

relatively large

/If observed F is greater than critical F, we will

reject H0 and conclude there is an association

between independent & dependent variables

STRENGTH OF ASSOCIATION

Once we know that there is SIGNIFICANT

association between IV & DV, we need to

estimate STRENGTH of association

/ This is important because it is possible for

associations that exist in population to differ in

how strong (or important) they are

/ In fact, relatively weak association can be

significant if sample size is large enough

association between IV & DV, we need to

estimate STRENGTH of association

/ This is important because it is possible for

associations that exist in population to differ in

how strong (or important) they are

/ In fact, relatively weak association can be

significant if sample size is large enough

STRENGTH OF ASSOCIATION

Measure strength of association in ANOVA using

eta squared (η2)

/ Indicates proportion of total variation that is due

to (explained by) independent variable/ between SS over total SS

eta squared (η2)

/ Indicates proportion of total variation that is due

to (explained by) independent variable/ between SS over total SS

STRENGTH OF ASSOCIATION

Interpretation: 6.66 % of total variation in

dependent variable (reading comprehension) is

explained by independent variable (type of

school)

/Thus, association between reading

comprehension & type of school is significant but

WEAK.

dependent variable (reading comprehension) is

explained by independent variable (type of

school)

/Thus, association between reading

comprehension & type of school is significant but

WEAK.

WHAT LEVEL OF ETA IS CONSIDERED

STRONG OR WEAK?

STRONG OR WEAK?

<10% = weak

10%-25% = moderate

>25% = strong/Remember, this is also dependent upon your

research question, hypotheses, units used to

measure variables & expected effect size

10%-25% = moderate

>25% = strong/Remember, this is also dependent upon your

research question, hypotheses, units used to

measure variables & expected effect size

WHAT'S UP WITH ANOVA...IS IT ONETAILED

OR TWO-TAILED?

OR TWO-TAILED?

ANOVA is an OMNIBUS test, meaning that it

just tests OVERALL differences

/There really isn't one-tailed vs. two-tailed option

with ANOVA (or F distribution)

/ F test is one-tailed. We reject H0 if observed F is

greater than critical F/However, ANOVA really tests two-tailed

hypothesis because testing whether there is

significant difference between groups (do not

state specific directional difference)

just tests OVERALL differences

/There really isn't one-tailed vs. two-tailed option

with ANOVA (or F distribution)

/ F test is one-tailed. We reject H0 if observed F is

greater than critical F/However, ANOVA really tests two-tailed

hypothesis because testing whether there is

significant difference between groups (do not

state specific directional difference)

In Other Words...

Significant F test only tells us that at least two of

groups are significantly different on DV

/ But we cannot tell which two are different

/ Could conduct t-test of difference between two

means to determine which two groups are

significantly different from each other

groups are significantly different on DV

/ But we cannot tell which two are different

/ Could conduct t-test of difference between two

means to determine which two groups are

significantly different from each other

NORMAL DISTRIBUTION &

EQUALITY OF VARIANCES

EQUALITY OF VARIANCES

When using ANOVA, we assume that the

dependent variable is normally distributed

/ However, if sample size is large enough, we can

relax this assumption because of CLT

/ Equal variances? ANOVA assumes that in

population, variance of DV is equivalent across

groups. Sample variances may not be exactly

equal. If they are close enough, F test will be

valid

/ Nonequivalence of variances only makes a

difference when working with small sample sizes

(not common in Sociology)

dependent variable is normally distributed

/ However, if sample size is large enough, we can

relax this assumption because of CLT

/ Equal variances? ANOVA assumes that in

population, variance of DV is equivalent across

groups. Sample variances may not be exactly

equal. If they are close enough, F test will be

valid

/ Nonequivalence of variances only makes a

difference when working with small sample sizes

(not common in Sociology)