How can we help?

You can also find more resources in our Help Center.

41 terms

ANOVA

STUDY
PLAY
ANOVA
Comparison of Means from
More Than 2 Groups
WHERE WE HAVE BEEN...
Compare mean obtained from one group to
predetermined number
1. One-mean hypothesis test
2. Can either be one-tailed (specify direction of
association) or two-tailed (no direction
specified)
WHERE WE HAVE BEEN...
Compare means from two groups
1. Two-means hypothesis test
2. Can either be one-tailed (specify direction of
association) or two-tailed (no direction
specified)
WHAT WE WERE REALLY DOING WHEN
COMPARING TWO MEANS
Testing hypothesis about association between
two variables/Associations are between CATEGORICAL IV
(nominal or ordinal) with two categories and
CONTINUOUS DV
WHERE WE ARE GOING...
What if we have more than 2 groups to compare?
/For example, what if we want to know if
happiness scores among people who are married,
divorced, widowed, OR never-married differ from
one another?
/ Cannot use z-tests or t-tests with more than 2
groups. So what do we use instead???
ANOVA
Analysis of Variance (ANOVA)
/Allows us to test whether there is association
between CATEGORICAL IV (nominal or ordinal
level) that has more than 2 categories and
CONTINUOUS DV
stating hypothesis
null:no association/research:assocation
WHAT WOULD PERFECT ASSOCIATION
BETWEEN CITY TYPE AND MURDER RATES
LOOK LIKE?
WITHIN every category of city type, all values
(ie. murder rates) would be same
/ BETWEEN categories of city type, mean murder
rates would be different
/ In other words, mean murder rates would be
different for each type of city BUT
/ All cities within each type (manufacturing, trade,
government) would have same murder rate
WHAT WOULD PERFECT ASSOCIATION
LOOK LIKE?
every row for respective column would have same number
WHAT WOULD ABSOLUTELY NO
ASSOCIATION LOOK LIKE?
no correlation between numbers in columns
WHAT REALITY GENERALLY LOOKS
LIKE
Of course, we never have perfect association (or
absolutely no association) between two variables
in social science
/BUT when we have STRONG association, most
of variation occurs BETWEEN categories/ Means that independent variable (city type)
explains most of variation in dependent variable
(murder rate)
PARTITIONING VARIANCE
How much single observation deviates from
grand mean/Mathematically we can divide total
deviation for given observation (xik) into
1. Extent to which xik differs from group mean k
(ie. difference WITHIN category)
2. Extent to which group mean k differs from
grand mean (ie. difference BETWEEN
categories)
PARTITIONING VARIANCE FOR SINGLE
OBSERVATION
We are interested in doing this for every observation in data
set
/ Dividing TOTAL variation across all observations into
variation BETWEEN categories & variation WITHIN
categories
/ We will use same methods to partition variance but this time
do it for all observations in data set
/ So we use sum of squares again
IN ENGLISH, PLEASE
WITHIN GROUP sum of squares is sum of
squared deviation of every raw score from its
group mean
1. You are figuring out extent to which each raw
score (xik) deviates from group mean (k)
2. Squaring these deviation scores to get rid of
negative signs
3. Then adding up squared deviations from each
observation within given group
MORE ENGLISH, PLEASE
In contrast, BETWEEN GROUP sum of squares
is sum of squared deviation of every group mean
from grand mean
1. You are figuring out extent to which each group
mean (k) deviates from grand mean (total)
2. Squaring these deviation scores to get rid of
negative signs
3. Then adding up squared deviations for all
groups
TESTING FOR ASSOCIATION WITH
ANOVA
How do we use this information to determine if
there is association between IV (type of city) &
DV (murder rates)?
If most of TOTAL VARIATION (SStotal) can be
attributed to variation WITHIN categories of IV
Then there is NO ASSOCIATION between IV
and DV
TESTING FOR ASSOCIATION WITH
ANOVA
How do we use this information to determine if
there is association between IV (type of city) &
DV (murder rates)?
/ If most of TOTAL VARIATION (SStotal) can be
attributed to variation BETWEEN categories of
IV
/ Then there is SIGNIFICANT ASSOCIATION
between IV and DV
step 1
CALCULATE MEAN FOR EACH
GROUP
step 2
CALCULATE WITHIN GROUP
SUM OF SQUARES
step 3
calculate between group sum of squares
STEP #4
CALCULATE DEGREES OF
FREEDOM (BETWEEN & WITHIN)
step 4
df between = k-1
where k is number of categories in IV/df within = n-k
where n is number of cases & k is number
of categories in IV
STEP #5
CALCULATE MEAN SQUARES
(BETWEEN & WITHIN)
step 5
Transform sums of squares (which are
measures of variation) into measures of
VARIANCE
/Measures of VARIANCE (mean squares)
differ from gross measures of VARIATION
(sums of squares) because...
/VARIANCE (mean squares) takes into
account degrees of freedom (ie. sample
size & number of groups in IV)
mean squares between
MS between = SSbetween / df between
mean squares within
MS within = SSwithin / df within
STEP #6
CONDUCT HYPOTHESIS TEST/Follow same 5 steps we have been using for
hypothesis testing
1. State null & alternative hypotheses
2. Determine alpha level
3. Determine critical value of F
4. Compute test statistic (in this case, use F test)
5. Compare obseved F to critical F & state
conclusion
STEP #1
state hypothesis
step 2
determine alpha level
step 3
find critical f/If df fall between two listed values, use SMALLER df/If df is greater than largest listed value (> 20 in
numerator or > 1000 in denominator), use infinity for
that component
step 4
calculate observed f/HIGHER ratio is, more variance that can be attributed to
differences BETWEEN categories
/ LOWER ratio is, more variance that can be attributed to
differences WITHIN categories
f observed=
MS between/ MS within
step 5
compare critical f to observed f
step 5
Remember, with ANOVA we are testing whether
between group variance is greater than within
group variance
/ We want to know whether observed value of F is
relatively large
/If observed F is greater than critical F, we will
reject H0 and conclude there is an association
between independent & dependent variables
STRENGTH OF ASSOCIATION
Once we know that there is SIGNIFICANT
association between IV & DV, we need to
estimate STRENGTH of association
/ This is important because it is possible for
associations that exist in population to differ in
how strong (or important) they are
/ In fact, relatively weak association can be
significant if sample size is large enough
STRENGTH OF ASSOCIATION
Measure strength of association in ANOVA using
eta squared (η2)
/ Indicates proportion of total variation that is due
to (explained by) independent variable/ between SS over total SS
STRENGTH OF ASSOCIATION
Interpretation: 6.66 % of total variation in
dependent variable (reading comprehension) is
explained by independent variable (type of
school)
/Thus, association between reading
comprehension & type of school is significant but
WEAK.
WHAT LEVEL OF ETA IS CONSIDERED
STRONG OR WEAK?
<10% = weak
10%-25% = moderate
>25% = strong/Remember, this is also dependent upon your
research question, hypotheses, units used to
measure variables & expected effect size
WHAT'S UP WITH ANOVA...IS IT ONETAILED
OR TWO-TAILED?
ANOVA is an OMNIBUS test, meaning that it
just tests OVERALL differences
/There really isn't one-tailed vs. two-tailed option
with ANOVA (or F distribution)
/ F test is one-tailed. We reject H0 if observed F is
greater than critical F/However, ANOVA really tests two-tailed
hypothesis because testing whether there is
significant difference between groups (do not
state specific directional difference)
In Other Words...
Significant F test only tells us that at least two of
groups are significantly different on DV
/ But we cannot tell which two are different
/ Could conduct t-test of difference between two
means to determine which two groups are
significantly different from each other
NORMAL DISTRIBUTION &
EQUALITY OF VARIANCES
When using ANOVA, we assume that the
dependent variable is normally distributed
/ However, if sample size is large enough, we can
relax this assumption because of CLT
/ Equal variances? ANOVA assumes that in
population, variance of DV is equivalent across
groups. Sample variances may not be exactly
equal. If they are close enough, F test will be
valid
/ Nonequivalence of variances only makes a
difference when working with small sample sizes
(not common in Sociology)