Stats Exam 2- Ch. 8-11 Thought Questions
Terms in this set (49)
What are the three assumptions that underlie parametric tests?
1. The dependent variable is assessed using scale measure- do not make this assumption if the data is nominal or ordinal (usually OK)
2. The participants were randomly selected- we must be cautious when generalizing from a sample to the population when this assumption is violated (OK)
3. The underlying population should have a normal distribution (the sample size should be at least 30 based on the CLT) (OK-sample size is at least 30)
-meeting the assumptions improves the quality of research but not meeting the assumptions doesn't necessarily invalidate research
What is the difference between parametric and nonparametric tests?
Parametric tests are inferential statistical analyses based on a set of assumptions about the population
Nonparametric tests are inferential statistical analyses that are not based on a set of assumptions about the population
What does it mean when we say that a statistical test is robust?
A robust statistical test is a hypothesis test that produces fairly accurate results even when the data suggests that the population might not meet some assumptions (doesn't mean that results are invalid, however)
-some parametric tests can be conducted even if some of the assumptions are not met and are robust against violations of some of the assumptions
What are the six steps of hypothesis testing?
1. Identify the population, comparison distribution, and assumptions
2. State the null and research hypotheses
3. Determine the characteristics of the comparison distribution
4. Determine the critical values or cutoffs
5. Calculate the test statistic
6. Make a decision
For the population SS/N=σ², but for a sample SS/n is a biased estimator of σ². Explain what that means and why it occurs. Why would SS/n typically underestimate the population variance?
It is biased because it doesn't accurately represent the population. The small sample size, n, doesn't accurately portray the difference in population variance
In order for your sample to be an unbiased estimator, you must subtract 1 from N, (N-1), in order to correct for the probability that the sample standard deviation slightly underestimates the actual standard deviation in the population
-this value will get us the mean of the squared deviations
-you need to subtract from N in order to account for the fact that there is likely to be some level of error when we estimate the population standard deviation from the sample; this will lead to a slightly larger and more accurate standard deviation
-in addition, it restricts the variability in scores
Dividing SS by "n-1" results in a "sliding" adjustment, yielding an unbiased estimate of σ². Explain the adjustment made by "n-1", especially when it relates to sample size.
By subtracting 1 from N, you will get a slightly larger and more accurate standard deviations value. This number will represent our mean of squared deviations
-As the value of N increases, we will approach z and it will be closer to the actual normal z distribution of the population (sliding adjustment)
-a larger sample will more likely be similar to that of the entire population than one derived from a smaller sample
Be able to explain how the CLT permits us to estimate the position of µ if we only know sample information? Why is it that we can take info from one sample and infer things about the population?
If we have a large enough sample size and each observation obtained is independent of the one before it, than we can collect this data and organize it into a distribution that looks more normal in shape. Even if the population is not normally distributed, the CLT allows us to approximate a normal curve by used the distribution of means
-a distribution of means is less variable than a distribution of scores and it usually represents the same mean as the population
-due to generalizability, samples are representative of populations
What is the difference between the scientific and statistical hypotheses? Explain what is represented by the null and alternate hypotheses? Be able to generate examples of each. Specify attributes of each. Why do we focus so much on the null hypothesis for hypothesis testing?
A scientific hypothesis is a formulated expression of something that you believe to be true based on prior knowledge or experience (If....then...). You design experiments around you hypothesis and hope to collect enough data to help support your answer.
Statistical hypotheses are...the null and research hypotheses! You can only show that something is statistically related (cannot prove)
The null hypothesis is a statement that postulates that there is no difference between populations or that the difference is in a direction opposite to that anticipated by the researcher
-a population parameter is equal to a certain value that is usually determined by previous knowledge or results
-we use the null hypothesis to see if statistics determine if there is a large enough difference between means of the samples that we can conclude there's likely a difference between the means of the underlying population
The research (alternative) hypothesis is a statement that postulates a difference between populations and sometimes this difference can be positive or negative in direction
-states that the population parameter is actually different from the value of the population parameter in the null hypothesis
-what you believe to be true or hope to prove true
Explain the difference between directional and non-directional hypotheses. What are the implications of using a one-tailed vs. two tailed hypothesis test? When should each be used?
A nondirectional alternative hypothesis states that the null hypothesis is wrong. A nondirectional alternative hypothesis does not predict whether the parameter of interest is larger or smaller than the reference value specified in the null hypothesis. (two-tailed)
-you are interested at both extremes
A directional alternative hypothesis states that the null hypothesis is wrong, and also specifies whether the true value of the parameter is greater than or less than the reference value specified in null hypothesis. (one-tailed)
-you only care about one extreme
The advantage of using a directional hypothesis is increased power to detect the specific effect you are interested in. The disadvantage is that there is no power to detect an effect in the opposite direction.
What is meant by a Type I error? (Be sure that you can describe it in practical terms, for a real investigation, not just referring to the null hypothesis)
A Type I error is one in which we reject the null hypothesis but the null hypothesis is actually correct (False Positive) (The probability of making a type I error is alpha)
ex. We think someone has a disease when they really don't
-this scenario is particularly detrimental because people often take action based on a mistaken finding
-someone who thinks they are pregnant (they actually aren't) starts to tell everyone and starts to buy baby clothes
If α≡0.05, what does that mean in practical terms? Why don't we set alpha even lower?
This value of alpha means that you are willing to take a 5% chance that you are wrong when you reject the null hypothesis. To lower the risk you must set alpha to a lower level but in doing so you will be less likely to detect if a true difference exists (if there really is one)
Why do scientists want to minimize Type I errors? Think about potential costs associated with a Type I error
Positive outcomes are more likely to be reported than null results, which is why Type I errors are worse to make than Type II. There are greater costs with taking action than not doing anything (type II)
-usually the most exciting and news breaking studies get posted and any boring ones get tossed to the way side
What is meant by a Type II error? What are some factors that influence the likelihood of this error?
A Type II error is one in which we failed to reject the null but we should have actually rejected the null (False Negative) (The probability of making a Type II error is beta)
ex. We think someone does NOT have a stress fracture when they actually do (cough...cough...Orwin)
-this results in a failure to take action and certain situations can get worse if left untreated
-these situations are usually less serious of errors to make but they can be detrimental (ex. a pregnant woman starts to drink because she doesn't think she is pregnant)
Factors: sample size, low power, standard deviation
Explain the way(s) in which the experimenter influences or has control over Type I or II errors (direct and indirect)
You can perform the test multiple times to see if you get the same results before reporting your findings; you can also evaluate the power of your tests before you take an action so you know ahead of time what the likelihood of rejecting/failing to reject the null hypothesis
-Type I: experimenter sets alpha level
-Type II: sample size choice, directionality chosen, dependence chosen
How are Type I and Type II errors different from experimental biases (malfeasance) in the conduct of research?
Experimental biases are either intentional or unintentional and should be avoided at all costs. These are confounding variables in a study and should be limited at all costs; but sometimes they are simply unavoidable. If they are unavoidable and they did have an effect in your study, than you should report the influence in your analysis report
a. Biases are the researchers fault
b. Errors happen when the numbers get incorrectly processed
-Type I and Type II errors are statistically significant and may indicate that a certain test just isn't possible. These are statistic flaws, not experimental errors
Explain the connection between sampling distributions and hypothesis testing
When we collect data from a sample, we will organize their scores onto a distribution chart which will allow us to see a population mean form from the results. Based on this distribution of a sample, we can create hypotheses that will allow us to relate our findings to the rest of the population
-we can determine the probability that a certain event will occur or will not occur
* Sampling distribution is meant to be representative of a population save you time for generalizing to a population. We use sampling distributions for hypothesis testing because it would be too difficult to get the entire population
* without the CLT hypothesis testing would not be possible the sampling distribution is made and set around the mean which is used in the null and alternate hypothesis of hypothesis testing
What is meant by "statistically significant"? What is meant by "p<0.05"? This is a probability of what? Under what circumstances do we make such a statement?
When you have a large sample size and a very small difference, your test is statistically significant (this is why larger sample sizes are preferred) . It means that you are very sure that the difference is real, and not important or large.
-It tells us how sure we are that a difference exists
-if p<0.05, then the probability is small that the relationship or difference happened by chance
-your finding would be significant and you would reject the null hypothesis
The probability is small that a difference or relationship happened by chance; your finding is significant and you would reject the null hypothesis
-t statistic is higher than the critical value
The probability is high that a difference or relationship happened by chance; your finding is not significant and you would fail to reject the null
-t statistic is lower than the critical value
What is a confidence interval and what does it reveal? What is it centered around? How is it related to hypothesis testing? How is it different?
The confidence interval is an interval based on the sample statistic and it includes the population mean a certain percentage of the time if we sample from the same population repeatedly
-a 95% CI is most common (calculate the upper and lower limits of the CI using your sample mean, the z stat, and standard error)
-the results determined by the CI help to confirm the results found in the hypothesis test while also adding detail (does our value fall within or out of the CI, do we reject (out of) or fail to reject (within) the null when our value is in (fail) or out of (reject) the CI?)
-the CI is centered around the sample mean rather than the population mean
-the confidence level is the actual value of 95%, while the interval is the set of points/values falling within that range
What does it mean to say a test is directional or nondirectional?
A directional test is a one tailed test in which you are looking to compare a sample either above or below the mean
ex. You want to compare how above your high school's average ACT score was compared to the national average
A nondirectional test is a two tailed test in which you are looking to see how if there is any difference, either positive or negative, from the mean
ex. You want to see if your high school's average ACT score differed from the national average (could be above, at, or below)
What are the advantages and disadvantages of using a one-tailed test and why do we almost always use a two tailed test?
One-tailed test= a hypothesis test in which the research hypothesis is directional, positing either a mean decrease or a mean increase in the dependent variable
Two-tailed test= a hypothesis test in which the research hypothesis does not indicate a direction of the mean difference or change in the dependent variable; merely indicates that there will be a mean difference
-one tailed tests are only used when the researcher is absolutely certain that the effect cannot go in the other direction, but this is hardly common (you are certain your HS's ACT score is above the national average)
-we use a two-tailed test because it gives us both sides of the interval in order to help reject the null; in other words, we have a greater area for possible values to fail under in order to help us reject the null
Be able to describe Cohen's effect size. What is meant, in practical terms, for a small, medium, or large effect size?
Cohen's d is a measure of effect size that assesses the difference between two means in terms of standard deviation, not standard error; it allows us to measure the difference between means using standard deviations, which we use in the denominator (instead of standard error)-HOW MUCH OF A DIFFERENCE THERE IS!
Small= 0.2 (85% overlap)
Medium= 0.5 (67% overlap)
Large= 0.8 (53% overlap)
-effect size allows us to classify if a statistic is significant or not
-overlap between two means will decrease as your sample size increases
What does an effect size contribute that significance does not? Is it possible to derive significance, but have a weak effect size? When might this be more likely?
Effect size indicates the size of a difference and is unaffected by sample size, it tells us how much two populations DO NOT overlap
-the less overlap, the BIGGER the effect size (more difference between your two means)
-significance just tells us a difference exists while the effect size puts an actual number to it
-large sample sizes can be significantly difference but have a weak effect size
What is meant by power? Why is it important? Be sure to understand our power diagram and be able to interpret variations.
Statistical power is a measure of the likelihood that we will reject the null, given that the null is false (probability that we will reject the null when we SHOULD reject the null, e.i. we will not make a Type II error)
-As power decreases, there is less of a chance of making a Type II error
-You should have a value of 0.8 or 80% minimum in order to correctly reject the null hypothesis and follow through with conducting a study
What are three ways to increase power or a statistical test?
1. Increase N (sample size) to achieve a 0.8 statistical power
2. Increase alpha, change p from 0.05 to 0.1, although this increases the chance of making a Type I error from 5% to 10%
3. Chance a two-tailed test into a one-tailed test which will increase statistical power
-you can also exaggerate the mean difference between levels of the IV; this will decrease the overlap between the curves and lead to a greater statistical power
Why must a specific alternative hypothesis be identified to calculate power? How does power hypotheses differ from our original (null and research) hypotheses?
* Fundamental because you cannot calculate power without using the null and research hypothesis. It differs in that the main portion is due to a critical mean that we calculate using the critical z score and the null mean.
Be able to demonstrate the impact of several variables (e.g. α, µ, directional/nondirectional tests, discrepancy to be detected, variability) on power and β. Think about what each does for our power diagram.
Alpha-called the significance level, usually at a p value of 0.05 or 0.1, this is an indicator of making a Type I error
Beta- this is the probability of making a Type II error and its value is dependent on the size of the effect, the sample size, and the chosen significance level (alpha); usually associated with the power of the test to detect an effect of a specific size
Alpha is what we use to set our critical values. N is the size of our sample. And directional and nondirectional tests can deem where our alpha level will be if it will be split on both sides of the diagram or only on one side.
Sample size increase power increase—width shrinks
· Direction -> non directional—alpha decreases, beta increases, power decreases
· Discrepancy to be detected- position of alt curve changes—difficult small differences
· Alpha changes- decreases, beta increases, power decreases—goal line changes
· Variability - position shifts, variability increases, power decreases
What are some criticisms of, or concerns regarding, traditional hypothesis testing?
You can't report a significant difference without reporting the size of the difference observed (i.e. the effect size) or the associated confidence intervals
-you need an effect size of a certain test in order to see if a certain study has meaningful or important results
Arbitrary significance—not based on actual consequences of type 1 error
· Dichotomous logic--black and white!
· Overemphasis on significance
Inadequate attention to other factors that influence significance- i.e. sample size, variance(poor control)
What alternatives to traditional hypothesis testing are available? How do they differ (assumptions, interpretation, etc.)?
Confidence interval—alpha level still arbitrary
· Alpha level based on risk -what constitutes risk?
· Report actual probability and reader can determine significance
How does the sampling distribution of the mean change when sigma is unknown?
If sigma is known, than the sampling distribution would resemble the population distribution exactly
When sigma is unknown there is more variability therefore the curve widens and t tests are used
-the curve will be similar to the population distribution but it will not be exact
What is the difference between the t and z distributions? Explain why the t distribution is wider than that for z?
A t statistic is more conservative than a z statistic is; it is not as extreme in value because it uses the estimated standard error instead of the actual population standard error
-the t tests are usually wider in distribution width and the area under the curve is not known so we must estimate it
How is the shape of the t distribution affected by df? Explain.
As N increases in size, which will ultimately increase the value of the df, you will come closer and closer to the z distribution
We calculated the "proportion of variance accounted for" (r²) by the IV. What does this mean? What does it reveal abut our experiment that isn't revealed by significance testing?
The variance is a measure of how much people differ in a sample:
* what percentage of the total variance is accounted for by the treatment/study
* measure of magnitude in percent
* how much influence the IV had on the DV
There are multiple ways of evaluating the outcome from an investigation. Compare the interpretation of: significance, confidence interval, effect size, power, and proportion of variance accounted for.
Confidence Intervals help tell where we can expect the hypothesized mean to fall 95% of the time. Effect size helps us understand how much overlap there is between the two distributions. Power helps us determine the likely hood that we will be able to reject the null hypothesis and not be in correct. Proportion of variance accounted for tells us the correlation of our variables if there is a strong or weak correlation
confidence int:---The actual data found, 95% of the sample means should fall within this interval—still based on alpha (CLT)—not dichotomous like hypoth testing
· Hypoth. Testing focuses on the null (what didn't happen)
o -black and white-no gray.
· Effect size---how many std dev did the treatment chance the control
· Proportion of variance tells how much of the total variance can be accounted for by the study done
How does our sampling distribution change when we use two groups in hypothesis testing?
It changes from a distribution of means to a distribution of mean difference scores.
* the center is around mean1 -mean2, which is 0. the variability is sm1-m2. -normal distribution(approaching even more normal with increasing n
* made of differences between means
* very robust
-is either a paired samples or independent samples t test
What is meant by independent groups designs (between subjects design or between groups design)? Give examples
We are using a between groups design, which means each group is assigned to only one condition of your variable. One group will be the control group while the other becomes the experimental group and each group is independent of what happens to the other
ex. One group receives treatment for a condition while the other does not and both groups are assessed after a certain period of time to see if there is a difference.
Sometimes we have equal n in two groups that are being compared and sometimes not. What impact does this have on our calculations? What is meant by a "pooled standard error" and why is that necessary?
Using a pooled variance, we are able to average two sample variances while accounting for any differences in the sizes of the two samples
-this is used as a estimate of the common population variance
-the estimate of variance from the larger sample counts for more in the pooled variance than the smaller sample because the larger sample tends to lead to somewhat more accurate estimates than do smaller samples
-by using the degrees of freedom of each, we calculate an average of the two variances
Homogeneity of variance states that in order to conduct a test the variances of the two samples must be equal. If they aren't it can make calculations difficult. The pooled standard error is necessary because without it we would not be able to create a proper distribution.
* when equal n errors can just be averaged together, when unequal they are weighted the one wit the bigger n as more weight than the the other
· Homogeneity of variance-if the n's are not close enough it is not safe to continue with the tests
What is meant by homogeneity or variance (homoscedasticity)? How can we test to see whether our data meet this assumption? How rigid (or lenient) is the F-max test in evaluating this assumption, e.g. do the variances have to be exactly the same? How does SPSS evaluate this assumptiom?
Two populations have equal variations. The standard deviation is the same across samples. There is some lenience between the samples as long as they are close enough it is okay to proceed. The F-max test must be less than or equal to 2 in order to proceed in most cases
fmax and levine(SPSS)
fmax is very robust in that is allows much deviation from the rules
s's are close enough to continue
What assumptions are behind independent groups tests? What is meant if we say that a given test is robust regarding these assumptions? What does this tell us regarding the importance of the assumptions for the independent groups test?
The assumptions for independent groups tests are the same as for single sample t tests and paired (dependent) sample t tests:
1. The dependent variable is assessed using a scale measure
2. The participants are randomly selected- if they aren't we are careful to make generalizations about the population
3. The distribution of the population must be normal so we have a sample size of at least 30
-when assumptions are not met but the test provided good results, we say that the hypothesis test was robust
-these three assumptions (which are based on inferential statistical analyses of parametric tests) improve the quality of research if they are met but not meeting them does not necessarily invalidate research
What is meant by dependent groups? Give examples of research designs appropriately analyzed with these techniques.
These are within-groups design tests; meaning that the same group is exposed to the control variable and the dependent variable (unlike independent samples which use a between groups design)
-In terms of calculations, everything remains the same except sampling error which decreases because the consistent ind diff are accounted for and subtracted out (this ultimately leads to a greater power because there is less error)
How is the sampling distribution altered for dependent groups vs. independent groups? Compare the size of the standard error that one would typically obtain for dependent vs. independent groups.
The sampling distribution for dependent groups is a distribution of mean difference scores.
The sampling distribution for independent groups is a distribution of differences between means.
The size of the standard error for dependent group samples is typically always smaller than that of an independent groups.
How does the correlation between sets of scores influence the outcome for dependent groups? What does this suggest regarding the use of matching variables?
It is multiplied by the error that gets subtracted out therefore the better the correlation the lower the error.
· The better groups are matched the better the correlation the lower the error
How is power affected by using independent vs. dependent groups? What does this suggest regarding the use of matching variables?
Dependent groups have more power because there is less variability in the numbers
Power is greater for dependent because there is less error
· Based on degrees of correlation- r increases power increases
Consider the pros and cons for using dependent groups for hypothesis testing.
PROS - better power, more correlation, less error, subtract out individual differences
CONS - difficulties in matching, carryover effects, loss of extremes
What difficulties are encountered in estimating the required sample size for a given study? What kinds of information are required before you can estimate the appropriate sample size?
You need a lot of information prior to a study -you need alpha, beta (power if possible) variability in data, type of test, 1/2 sample directional? In/dependent, difference to be detected
How does specifying a desired effect size help in estimating sample size? How is this considered a short-cut, i.e. what information is no longer required by using the effect size?
· Don't need to know variability or difference to be detected to need to know power, so good that we don't have to know them
· Estimate ratio of cohens d—what effect size do you want?
Why do we calculate confidence intervals?
You need the sample mean, the critical t statistic, and the standard error to calculate the 95% CI for a paired samples (dependent samples) t test
You need this data in order to help reject the null hypothesis because we want to show that our experimental mean will be greater or less than not equal to the null mean 95% of the time to help us reject the null.
Hypothesis testing just showed significance; now you need to know by how much is it significant
How does considering our conclusions in terms of effect size help to prevent incorrect interpretations of our findings?
Because if there is a small effect size we cannot say that the difference is important enough for us to draw significance to it.