96 terms

significance test

a formal procedure for comparing observed data with a claim or hypothesis whose truth we want to asses

claim is a

statement about a parameter, like the population proportion p or population mean u

say the probability of the event that he makes a minimal amount GIVEN THAT he really makes a higher amount on average

Conditional Probability --> if the estimation of this probability is low, it's likely that the claim was false, and that the amount he makes on average is actually lower than he says it is

Two Things can occur when given evidence:

1) the claim is correct and the sample was simply a bad sample

2) the population proportion/mean is actually lower or higher than is stated

2) the population proportion/mean is actually lower or higher than is stated

explanation 1 COULD be correct, but if it's

unlikely that that occurred after calculating the conditional probability, it's likely that it's false

****JUDGE USING THE 5% RULE****

**

***an outcome that would rarely happen if a claim were true is

good evidence that the claim is not true***

Null Hypothesis (Hnot)

the claim tested by a statistical test; the statement of "no difference"; the one we want to prove against

Alternative Hypothesis (Ha)

the claim about the population we're trying to find evidence to support; proves the Null Ho wrong

The hypothesis MUST be made before looking at the data, not

looking at the data first and then fitting a hypothesis to it

an Alternative Ha is ONE-SIDED if

it states that a parameter is LARGER/SMALLER than the null hypothesis value

it is TWO-SIDED if it states that

the parameter is DIFFERENT from the null hypothesis (either larger or smaller)

Null Hypothesis has the form:

Ho: parameter = value

Alternative Hypothesis has three forms:

Ha: parameter > value

Ha: parameter < value

Ha: parameter is NOT= to value

Ha: parameter < value

Ha: parameter is NOT= to value

** ALL the hypothesis refer to the

POPULATION and NOT the sample, so always state Ho and Ha in terms of population parameters, NOT sample statistics like phat or xbar

P-value

the probability, assuming that Ho is true, that the statistic (such as phat or xbar) would take a value as extreme as or more extreme than the one actually observed.

the SMALLER the P-value, the

stronger the evidence against Ho is --> the observed result is unlikely to occur when Ho is true

larger P-values fail to give good evidence b/c they say that

the observed result is likely to occur by chance when Ho is true

the alternative hypothesis sets the direction that counts as evidence against Ho -->

if one-sided - only low/high counts as evidence against

if two-sided - both count as evidence against

if two-sided - both count as evidence against

if we can't prove a hypothesis wrong, that doesn't mean it's true, it simply means that

the data are consistent with Ho

never write "accept Ho" because

just because there's not enough evidence to prove its guilt, doesn't mean it's innocent. --> always say "reject" or "fail to reject"

If the P-value is smaller than alpha, we say that the data are STATISTICALLY SIGNIFICANT AT LEVEL "ALPHA" -->

reject the null hypothesis and conclude that there is convincing evidence in favor of the alternative hypothesis Ha

"Significant" means that it's

NOT LIKELY TO OCCUR BY CHANCE ALONE, not that it's important

the actual P-value is more informative than a statement of significance because

it allows us to asses significance at any level we choose (a result of P=0.03 is significant at the alpha = 0.05 leve but not at the alpha = 0.01 level)

When we use a fixed significance level to draw a conclusion in a statistical test,

P-value < alpha --> reject Ho, can conclude Ha

P-value > alpha --> fail to reject, cannot conclude Ha

P-value > alpha --> fail to reject, cannot conclude Ha

most commonly used significance:

alpha = 0.5

if going to draw a conclusion based on statistical significance, the

significance level alpha must be stated BEFORE the data are produced (otherwise someone could set the alpha level after data have been analyzed in the attempt to manipulate the conclusion)

the purpose of a significance test is to

give a clear statement of the strength of evidence provided by the data against the null hypothesis. the P-value does this.

How small a P-value is convincing evidence against the null hypothesis?

1) How plausible is Ho --> if common misconception, need strong evidence (really small P-value!) to convince them otherwise

2) What are consequences of rejecting Ho? --> If rejecting Ho, it'll mean making an expensive change --> need strong evidence that it'll be beneficial

2) What are consequences of rejecting Ho? --> If rejecting Ho, it'll mean making an expensive change --> need strong evidence that it'll be beneficial

giving the P-value allows each of us to decide

individually if the evidence is strong enough

There is NO practical distinction between P-values 0.049 and 0.051 without the

decided alpha level 0.05 --> former causes us to reject Ho, latter causes us to fail to reject Ho

Type I error

reject when Ho is true

Type II error

fail to reject when Ho isn't true

deciding which error is more serious depends on

the context of the question

we can asses the performance of a significance test by looking at the probabilities of

the two types of error

***The significance level alpha of any fixed level test is the probability of a Type I error, meaning,

alpha is the probability that the test will reject the null hypothesis Ho when Ho is in fact true. CONSIDER THE CONSEQUENCES OF A TYPE I ERROR BEFORE CHOOSING THE SIGNIFICANCE LEVEL!

significance test makes a Type II error when it fails to reject

a null hypothesis that is really false

A high probability of a Type II error for a particular alternative means that the test is not

sensitive enough to ususally detect taht alternative

**In the significance test setting, it is more comon to report the probability that a test DOES reject Ho

when an alternative is true --> "power" of the test against that specific alternative --> the higher the probability, the more sensitive the test

the POWER of a test against a specific alternative is

the probability that the test will reject Ho at a chosen significance level alpha when the specified alternative value of the parameter is true

Type II error and Power are

closely linked

the power of a test gives

the probability of detecting a specific alternative value of the parameter --> the choice of that alternative value is made by someone with a vested interest in the situation

power of a test is a number between

0 and 1

power close to 0 -->

the test has almost no chance of detecting Ho as false

Significance Level of a Test:

the probability of reaching the wrong conclusion when the null hypothesis is true

the power of a test to detect a specific alternative is the

probability of reaching the right conclusion when that alternative is true --> THE PROBABILITY OF MAKING A TYPE II ERROR

the power of a test against any alternative is 1 minus the

probability of a Type II error for that alternative --> power = 1-B

How large a sample/how many observations do we need to make to carry out the Significance Test?

1) Significance Level: how much protection we want against a Type I error (getting a significant result from our sample when Ho is true)?

2) Practical Importance: how large a difference between hypothesized parameter value and the actual parameter value is important in the practice?

3) Power: how confident do we want to be that our study will detect a difference of the size we think is important?

2) Practical Importance: how large a difference between hypothesized parameter value and the actual parameter value is important in the practice?

3) Power: how confident do we want to be that our study will detect a difference of the size we think is important?

***Decreasing the Type I error probability alpha

INCREASES the Type II error probability B

(OPPOSITE IS ALSO TRUE)

(OPPOSITE IS ALSO TRUE)

the smaller the significance level, the

larger the sample size needed (a smaller significance level requires stronger evidence to reject the null hypothesis)

the higher the power, the

larger the sample needed (higher power gives a better chance of detecting a difference when it is really there)

at any significance level and desired power,

detecting a small difference requires a larger sample than detecting a large difference

to maximize the power of a test,

choose as high an alpha level (Type I error probability) as you are willing to risk AND as large a sample as can afford

9.2

STARTS HERE!

3 conditions must be met before conducting a significance test:

- Random

- Normal

- Independent

- Normal

- Independent

test statistic

says how far the sample result is from the null parameter value, and in what direction, on a standardized scale (Normal)

test statistic =

statistic-parameter

----------------

S.D. of statistic

----------------

S.D. of statistic

when the conditions are met (RNI) the sampling distribution of ^p is

approximately Normal with

mean u^p = p and S.D. o^p = srt of (p(1-p)/n)

mean u^p = p and S.D. o^p = srt of (p(1-p)/n)

just like last time, to get the STANDARD ERROR, sub in ^p fro p in the

standard deviation formula

z =

^p -Po

--------------

srt of (Po(1-Po)/n)

--------------

srt of (Po(1-Po)/n)

***this z-statistic has approx. Normal distribution when

Ho is true

One-Sample "z" Test for a Proportion:

Choose an SRS of size n from a large population that contains an unknown proportion p of successes. To test the hypothesis Ho: p = Po, compute the z statistic:

^p -Po

--------------

srt of (Po(1-Po)/n)

^p -Po

--------------

srt of (Po(1-Po)/n)

see yellow box on page

553

*if the Normal condition and CLT don't apply,

can't do this.

if the evidence from the sample proportion doesn't support the Ha,

there's no need to do a significance test

Steps for Significance Tests:

1) what hypothesis are you using? what significance level? what parameters?

2) choose the method and check the conditions

3) compute the test statistic

4) find the P-value

2) choose the method and check the conditions

3) compute the test statistic

4) find the P-value

when conditions are met, sampling distribution of p^ is approx. Normal with

mean up^ = p

and

standard dev. op^ = srt. of p(1-p)/n

and

standard dev. op^ = srt. of p(1-p)/n

for confidence intervals, sub in p^ for p

in the sd formula to get the standard error

BUT in a significance test,

null hypothesis specifies a value for p, which we will call pnot

test statistic:

z = p^ - pnot

----------

srt. of pnot (1-pnot)/n

----------

srt. of pnot (1-pnot)/n

z statistic is approx. Normal when the standard Normal distribution is true -->

P-values come from the standard Normal distribution

One-Sample Z Test for a Proportion:

choose an SRS of size n from a large population that contains an unknown proportion p of successes --> test the hypothesis Hnot: p = pnot

--> compute the z-statistic

--> find the P-value by getting the probability of getting a z stat this large/larger in the direction specified by the Ha hypothesis

--> compute the z-statistic

--> find the P-value by getting the probability of getting a z stat this large/larger in the direction specified by the Ha hypothesis

Ha: p > pnot

shade right corner

Ha: p < pnot

shade left corner

Ha: p not = to pnot

shade both corners

*****USE THIS TEST ONLY WHEN:

# of successes/failures npnot and n(1-pnot) are AT least both 10 and 10% condition is met

if you get a sample that is greater than 0.08, it's clear that

you need to do a SIFNIFICANCE TEST --> find the P-value

if the P-value is greater than 5%, we can't

rule out sampling variability as a cause, and must reject Hnot

IF THE PROPORTION IS NOT greater than 0.08,

we don't need to do a significance test b/c we ALREADY fail to reject Ha

the probability of making a Type I error is equal to

alpha, and SO, if Type I error is the one you want to make the least, you should opt for a smaller significance level

A greater risk of a Type I error means a smaller risk of making a

Type II error and a HIGHER POWER to detect a specific alternative value of the parameter.

If they're equally bad, use the standard significance level,

0.05

*****When you reverse the success and failure, you are

changing the sign of the test statistic z. The p-value remains the same, and OUR CONCLUSION DOES NOT DEPEND ON OUR INITIAL CHOICES OF SUCCESS AND/OR FAILURE

confidence interval:

^p +/- z X srt. of (^p (1-^p) / n)

the confidence interval gives an

approximate range of Po's that would NOT be rejected by a two-sided test at the 0.05 significance level (therefore a range of plausible values for the true population parameter p

100(1-significance level)%

___% confidence interval

standard deviation is still

srt of (^p (1-^p) / n)

a two-sided test at Ho:p = po at significance level alpha gives

roughly the same conclusion as 100(1-alpha)% confidence interval

P-value is LESS THAN alpha

we reject Ho

P-value is GREATER THAN alpha

we fail to reject Ho

9.3 BEGINS HERE!!!!!!!!!!!!!!!!!!

9.3 BEGINS HERE!!!!!!!!!!!!

if the sample size is small, its still possible to use it, it just means that

we have to look at the data in order to be able to tell if we can use it (HISTOGRAM)

test Ho: u = uo, statistic is sample mean

xbar

it's stand. dev. is

ox = o/ srt. of n

ideal world test statistic:

z = xbar - uo

--------------

o/ srt. of n

--------------

o/ srt. of n

BUT because the actual standard deviation is

unknown, we have to put the SAMPLE s.d. in it's place

question on page

568