Search
Browse
Create
Log in
Sign up
Log in
Sign up
Upgrade to remove ads
Only $2.99/month
AP Statistics Summary
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Key Concepts:
Terms in this set (734)
Variables can be ________ or ________.
quantitative, qualitative
________ is the use of data to make informed decisions.
Statistics
Which of the following options is the correct answer to the problem?
4+5(2
3)-(
16) = ________
22
Statistics in Latin is ________, which means "state."
status
________ research is best for testing cause and effect relationships between variables.
Experimental
A sample is________.
a subgroup of the population
Any event or quality that can assume more than one value is called ________.
variable
Data collected from a sample are described by ________.
statistics
Data collected from a population is described by ________.
parameters
In an experiment, the researcher manipulates the ________ variable and measures changes in the ________ variable.
independent, dependent
The number 1456.89009 rounded to the nearest hundredth is ________.
1456.89
The correct response to the following problem is ________.
46
Which of the following options is the correct answer to the problem?
23.67
The number 4,210.4219 rounded to the nearest hundredth is ________.
4,210.42
The number 237.02831 rounded to the nearest thousandth would be ________.
237.028
The data collected from the responses to the following question on a survey would be an example of what type of variable? Please indicate your political affiliation.
categorical variable
A researcher measures aggressive behavior by asking participants the following question: In the past, if you engaged in any of the behaviors listed below, circle the behavior:
hit a person with your fist
used a weapon to harm a person
pushed a person to the ground
In this study, aggressive behavior is measured using a(n) ________ scale of measurement.
nominal
Dr. Z measures each patient's temperature (in Celsius degrees) before administering the experimental medication. Temperature on the Celsius scale is a(n) ________ scale of measurement.
interval
A researcher wants to know if a newly proposed medication has the side effect of weight loss. He weighs a group of rats before starting them on the medication, and then weighs them again after being on the medication for two months. Weight is measured on the ________ scale of measurement.
ratio
Suppose a researcher found statistics students were less anxious during exams when the exam was administered in a large classroom compared to small classroom. In this study, the independent variable is ________ and the dependent variable is ________.
classroom size, amount of anxiety
A(n) ________ is similar to a scatter plot, but with the data points connected to indicate that the data is continuous in nature.
line graph
Which of the following is used to graph matched or paired quantitative variables?
scatter plot
A ________ is a collection of observations presented by listing the frequency of each observation's occurrence.
frequency distribution
________ must be the same size, consecutive, and must contain all individual observations.
Class intervals
Which of the following tables represents the correct relative frequencies?
Table 1
Table 2
Score
f
rf
Score
f
rf
5
9
.25
5
9
.35
4
8
.22
4
8
.33
3
10
.28
3
10
.38
2
9
.25
2
9
.35
Table 3
Table 4
Score
f
rf
Score
f
rf
5
9
.45
5
9
.25
4
8
.44
4
8
.47
3
10
.48
3
10
.75
2
9
.45
2
9
1.00
Table 1
Data organized by the number of each possible observation is a ________.
frequency distribution
Picture representations of data are called ________.
graphs
In a cumulative frequency distribution, how does the number of observations in a particular class interval relate to the class interval immediately before it?
The frequency of the observations of the previous class interval is equal to or smaller than that of the succeeding class interval.
Equal size groups in a grouped frequency distribution are called ________.
class intervals
The sum of a cumulative relative frequency distribution is ________.
1
The sum of a cumulative frequency distribution is ________.
variable, depending on the data
Which of the following tables represents the correct relative frequencies for a data set with n=50?
Table 1
The scores on an exam in history of psychology are shown in the following frequency distribution. For the distribution, what is the total number (n) of observations?
27
What type of graph is the following?
HEIGHT AND WEIGHT OF CHILDREN
scatter plot
The following graph indicates that ________.
the more expensive the car, the higher the top-end miles per hour
Scatter plots are used to graph which of the following?
two matched quantitative variables
Lisa surveyed her general psychology class. She asked students to respond to the following question: What portion of the course did you find most helpful for studying for the final exam? Students could choose only one from the following options: lectures; study circles; movie series; textbook; supplemental materials. Which of the following graphs would be the BEST way to present the number of students who selected each option?
bar graph
Given the following data, what would be the most appropriate graph to use for displaying the relationship between the following two variables?
scatterplot
Which of the following is the most appropriate way to graph the following data?
Tests
Quizzes
Attendance
Homework
Project
20%
10%
5%
15%
50%
pie chart
Data collected on all but which of the following measurement scales can be graphed in a histogram?
nominal/ordinal
The ________ is the simplest measure of variability.
Range
The measure of the average amount by which observations deviate from the mean is the ________.
standard deviation
A dataset contains the values 9, 4, 6, 8, 9, 7, 5, 9. The mode of that distribution is equal to ________.
9
Calculate the second quartile for the following data set: 2, 5, 6, 8, 10, 11, 12, 14
9
Which set of scores has the least amount of variability?
212, 214, 215, 216
Measures of central tendency are ________.
scores that represent the center of a distribution.
The ________ is the only measure of central tendency that can possibly have more than one value.
mode
Which measure of central tendency is most affected by outliers?
mean
What is the major problem with using the variance to describe the variability of a distribution?
The variance is expressed in squared units.
A distribution with a high score of 53 and a low score of 10 has a range equal to ________.
43
Compute the range for the following five scores: 5, 18, 24, 65, 89, and 104.
99
A dataset contains the values 9, 4, 6, 8, 9, 7, 5, 9. The variance of that distribution is equal to ________.
3.84
A dataset contains the values 25, 20, 21, 29, and 25. The standard deviation of that distribution is equal to ________.
3.61
What is the interquartile range for the following data set?
10 11 13 14 17 18 19 20 20 21 24 25 29 30 30 33 35 36 37
13
Calculate the second quartile for the following data set:
1, 2, 3, 3, 5, 6, 7, 8
4
The second quartile is equal to which of the following?
median
Calculate the first quartile for the given data set:
5 5 6 6 7 8 9 9 9 9 14 15 19.
6
A distribution contains the following values: 19, 17, 20, 22, 19, 21, 18, 19. Which of the following statements is an accurate prediction of the standard deviation of the distribution?
The standard deviation of this distribution will be considered small, given the dataset.
An industrial psychologist working for a large corporation wanted to know the extent emotional intelligence varied among company supervisors. She randomly selected five supervisors from each division of the company and administered an emotional intelligence test to the entire sample of supervisors. Results are shown below. Which division had the most consistency in their emotional intelligence scores?
Division D
A relatively small standard deviation would indicate which of the following?
The scores in the distribution are very similar.
The correct order of the scales of measurement, from simplest to most complex is ________.
nominal, ordinal, interval, ratio
Qualitative data is most likely to be graphed using ________.
bar graphs
Class intervals ________.
must be the same size
Which of the following is NOT an example of a variable?
categorical quantitative variable
A population refers to ________.
the collection of all possible members of interest to a researcher
Which of the following statements is NOT true with regard to fractions?
When adding fractions, we simply add together the numerators and then add together the denominators.
In research, we typically collect data from a ________ in order to make inferences about a ________.
sample, population
Inferential statistics allow us to ________.
use data from a sample to draw conclusions about our population
The number 3921.231 rounded to the nearest hundredths is ________.
3921.23
Which of the following represents the correct order of operations?
parentheses, exponents or square roots, multiplication or division, addition or subtraction
Which of the following is NOT likely to be considered an extraneous variable in a study that analyzes the correlation between teaching style and attentiveness?
a student in the classroom falls asleep
Non-parametric statistical tests are used with variables collected on the ________ or ________ scales of measurement.
nominal, ordinal
A variable that can be either continuous or discrete is a ________ variable.
quantitative
Variables measured on a(n) ________ or ________ scale permit the most options with respect to the number and inferential power of statistical tests.
interval, ratio
The proportion of observations that lie at or below a particular class interval is called the ________.
cumulative relative frequency
Calculate the mean of the following distribution: 5, 7, 4, 8, 6, 9, 7, 7
6.63
The interquartile range gives us an indication of how much ________.
the middle 50% of our distribution varies
The graph that represents the count for each category is a ________.
bar graph
The cumulative relative frequency will ________.
always be â‰¤ 1.0
Calculate the standard deviation of the following distribution: 9, 9, 11, 10, 7, 11, 14, 11, 13
2.13
________ is the likelihood of the occurrence of some event.
Probability
In which of the following scenarios would we be calculating the probability of mutually exclusive events?
the probability of drawing a red card or a black card from a deck of cards
________ probability refers to the type of probability when each outcome is equally likely to occur.
Classical
________ probability is based on observations obtained from probability experiments.
Empirical
Independent events are ________.
FALSE: events that cannot occur together
The likelihood of the occurrence of some event is referred to as ________.
probability
Probabilities approaching ________ indicate that an event is not likely to happen.
0
________ probability is based on observations obtained from probability experiments.
Empirical
Which of the following rules states that the empirical probability of an event will be close to its theoretical or actual probability if the experiment is performed repeatedly?
Law of Large Numbers Rule
Probability can range from ________ to ________.
0, 1
Calculating the probability that a person drives faster than 80 miles per hour and then has a car accident would mean that we are calculating the probability of two ________ events.
dependent/conditional
Calculating the probability of getting heads on one coin flip and tails on a completely different coin flip would mean that we are calculating the probability of two ________ events.
independent
Calculating the probability of being male and being happy would mean that we are calculating the probability of two ________ events.
independent
"The next animal you see in the zoo is a tiger" and "the next animal you see in the zoo is a lion" are examples of two ________ events.
mutually exclusive
In which of the following problems would we calculate the probability of mutually exclusive events?
the probability of being a gorilla in the wild and in the zoo
In which of the following problems would we calculate the probability of mutually exclusive events?
the probability of being a mustang horse and a Ford Mustang
Given that the sample has 115 boys and 135 girls, calculate the probability of a child being male.
.46
Given the following population data for a particular school, suppose that school administrators pull one student's permanent record at random and then replace it. Then they pull a second record at random. What is the probability that the first record belonged to a female student and the second record belonged to a white student?
Sample Characteristics
Frequency
Girls
115
Boys
135
Ethnicity:
White
95
African American
89
Hispanic/Latino
38
Other
28
.17
Given the following data, calculate the probability that at a randomly chosen time of the day, a child is watching either non-educational or educational television.
Children's Daily Activities
Frequency in minutes (each minute of the day is accounted for: 1440 total minutes)
Chores
32
Educational Activities
78
Outside Play
65
Personal Care
60
Sleep
510
Socializing
45
Television:
Educational
66
Non-educational
294
Videos (entertainment)
65
Video Games
125
Other
100
.25
Given that there are 35 psychology majors at a campus with 500 students, what is the probability that the next student you meet is a psychology major?
.07
Which of the following is NOT a characteristic of a normal curve?
The mean, median and mode are close to one another
Given the following information about a sample, calculate a z score for a test grade of 89.
FALSE -.49
The ________ of a distribution determine the shape of a normal curve.
mean and the standard deviation
Given the following information about a sample, determine the z score for a raw score of 130.
-1.21
The z-distribution has a mean of ________.
0
The smaller the standard deviation, the ________.
less spread out the data are
Sue had two tests this week, one in Algebra and one in English. Her test scores are listed below, along with the class mean and standard deviation. Which of the following statements is NOT true about Sue's performance on these two tests?
Sue's performance is better on the Algebra test than on the English test.
The vertical axis of a standard normal curve represents ________.
FALSE frequency
The ________ of a normal distribution determines the shape of the normal curve.
standard deviation
The approximate percentage of the area of the normal curve that falls between two standard deviations above and below the mean is ________.
95%
The proportion of the area under the curve that falls between the mean and one standard deviation below the mean is ________.
.3413
If a normal distribution has a large standard deviation, the shape of the curve will be ________.
short and spread out
The standard normal distribution has ________.
a set mean and standard deviation
Given the following information about a sample, determine the z score for a raw score of 135.
-.66
Given the following information about a sample, calculate a z score for a test grade of 91.
1.00
Jane had two tests this week, one in Algebra and one in English. Her test scores are listed below, along with the class mean and standard deviation. Which of the following statements is NOT true about Jane's performance on these two tests in comparison to other students?
Jane's performance is better on the English test than on the Algebra test.
A student's test grade is represented as z = -.23. Given that the mean is equal to 82 and the standard deviation is equal to 4.12, what is the raw test score earned by this student (rounded to the nearest whole number)?
81
Given the following information, calculate a z score for a student who scores a 78 on the test.
-.92
Using the Standard Normal Table, determine the proportion of the area under the normal curve that falls between z = -.30 and z = .30.
.2358
Using the Standard Normal Table, determine the proportion of the area under the normal curve that falls between the mean and z = -1.38.
.4162
Using the Standard Normal Table, determine the approximate percentage of the area under the curve that falls above z = .85.
20%
Using the Standard Normal Table, determine the proportion of the area under the normal curve that falls above the test score of 40, given that the mean test score is equal to 36 and the standard deviation is equal to 3.68.
.1379
Using the Standard Normal Table, determine the proportion of the area under the normal curve that falls between the mean and z = 1.29.
.4015
The Central Limit Theorem contends that as n increases, the sampling distribution will ________.
approach a normal distribution
Confidence intervals are calculated in order to determine the range in which the ________ falls.
population mean
We should assume that the sample mean and the population mean ________ because of sampling error.
are not equal
Given the following information, construct a 95% confidence interval.
276.98 - 283.02
As the size of the distribution decreases, the sampling distribution will likely be ________.
FALSE non-normal
The Central Limit Theorem contends that ________.
as n increases, the sampling distribution will approach a normal distribution
The idea that if we repeatedly choose random samples from the same population, the mean of the sampling distribution of sample means will equal the population mean is supported by ________.
the Central Limit Theorem
As the size of the distribution increases, the sampling distribution will approximate a ________.
normal distribution
When calculating a confidence interval using a sample with 37 subjects, the degrees of freedom is equal to ________.
36
Calculating confidence intervals will ________ give us the exact estimate of the population mean.
never
We calculate confidence intervals in order to determine the range in which the ________ falls.
population mean
The term used to indicate how many observations in our distribution can vary is ________.
degrees of freedom
A confidence interval is ________.
a range in which the population mean probably falls
The standard error of the mean is ________.
an indication of how much error exists in the estimated population mean
Calculate the estimated standard error of the mean for a distribution with 225 students, given a mean of 92 and a standard deviation of 10.52.
.70
Given XÌ„ = 491, s = 33.97, and n=115, the value for t used in the formula for constructing a 95% confidence interval is:
2.00
Construct a 99% confidence interval using the following information.
44.12-45.88
Construct a 99% confidence interval using the following information.
1788.25-1951.75
Construct a 95% confidence interval using the following information.
86.81-91.19
Given the following information, construct a 95% confidence interval.
293.79-300.21
A probability of .75 indicates that an event is ________.
more likely to occur than not occur
When the population standard deviation is not known, we use the ________.
estimated standard error of the mean
The number of values in our distribution that are free to vary is referred to as the ________.
degrees of freedom
Suppose on a particular test, the class mean was a 94 with a standard deviation of 12. The z-score of 73 is ________.
-1.75
________ are such that one event does not affect the probability of the other event occurring.
Independent events
When one event's probability is affected by another event already having occurred, we have a(n) ________ probability.
conditional
The likelihood that a certain event will occur is referred to as a ________.
probability
To determine the probability of two events occurring together, we would need to use the ________.
multiplication rule
Events that cannot occur together are called ________ events.
mutually exclusive
Which of the following is denoted by E'?
the complement of E
If you rolled a die 500 times, and you rolled a three 275 times, the empirical probability of rolling a three is ________.
.55
What is the theoretical probability of rolling a 2 on a single roll of the die?
.17
If an institution with 1,121 total students has 47 psychology majors and 53 communication majors, and 20 of those students are both psychology and communication majors, what is the probability that the next student you meet will be a psychology or communication major?
.07
If the probability of an event occurring is equal to .47, the probability of the complement of that event occurring is equal to ________.
.53
John and Edward were having a discussion about what television character was the better crime solver. John said it was impossible to compare John's favorite TV prosecutor to Edward's favorite TV sheriff. Edward, having taken a statistics course, knew that the two could be compared. John found that TV prosecutors won an average of 12.4 cases per season, with a standard deviation of 2.3; Edward found that TV sheriffs made 20.4 arrests with a standard deviation of 4.5. Using what you know about z scores, determine if John's favorite TV prosecutor, who has won 14.3 cases during this season, served more justice than Edward's favorite TV sheriff, who had 26 arrests during this season.
The z score for Edward's favorite sheriff is 1.24; the z score for John's favorite prosecutor is .83. Therefore, Edward's favorite sheriff served more justice than John's favorite prosecutor
Select the statement about normal distributions that is true.
The higher the standard deviation, the more short and wide the curve.
Given a distribution of raw scores, select the statement that is true regarding means, standard deviations, and z-scores.
The distribution will only have one mean and standard deviation, while there are many different z-scores.
Given the following information, which of the following is the most appropriate 95% confidence interval?
13.25 to 15.25
We calculate a confidence interval in order to have confidence in ________.
our estimate of the population parameter
Given the following information, which of the following is the most appropriate 99% confidence interval?
183.24 to 196.22
When we fail to reject the null hypothesis, which of the following is true?
group differences are due to chance, not to the conditions of the experiment
When we specify the direction of the difference between two groups, the hypothesis is _________.
one-tailed
If the difference is predicted but the direction is not specified, the hypothesis is ________.
two-tailed
The ________ describes what the researcher expects to see happen in the experiment.
research hypothesis
In a research study, tobt = 8.387 and tcrit = 3.234. Which of the following is true?
We reject the null hypothesis.
The process of using sample statistics to test a claim about the value of a population parameter is referred to as ________.
hypothesis testing
The hypothesis that describes what the researcher expects to see happen in the experiment is the ________.
research hypothesis
The hypothesis that supports the claim that differences between groups are due to chance is the ________.
null hypothesis
A statistical hypothesis is ________.
a statement or claim about a population parameter
A researcher failed to reject the null hypothesis, but in reality, the null hypothesis should have been rejected. This is an example of which type of error?
Type II
Failing to reject the null hypothesis when it should have been rejected is an example of a ________.
Type II error
What is tobt for the following research study?
1.84
In a research study, tobt = -5.387 and tcrit = 2.145. Which of the following conclusions is true?
We reject the null hypothesis; our group means are significantly different from one another.
Given the following results from a t-test, which of the following statements is an accurate interpretation?
tobt = 1.972 and tcrit = 2.000
We fail to reject the null hypothesis; our group means are not significantly different from one another.
Given the following hypotheses, which statement is correct?
A one-tailed test of the hypothesis should be used.
A researcher believes alcohol consumption will increase reaction time while driving. Using a driving simulator, the researcher divided participants into two groups: group one that will drive without consuming alcohol and group two that will drive under the influence of alcohol. Reaction times to several driving distractions are recorded for all participants. In this scenario, the correct research hypothesis to use is ________.
H1: x2 > X1
A researcher predicts that caffeine will decrease the time required for participants to complete a memory task. She divides participants into two groups: group one that consumes 2 cups of caffeinated coffee 30 minutes before completing the memory task, and group two that doesn't consume any coffee or caffeinated beverages at all. The researcher measures the time required for each participant to complete the memory task. Which of the following is the correct research hypothesis to use in this scenario?
H1: x2 > X1
A researcher predicts that antihistamines will slow reaction time in participants. She divides participants into two groups: group one who takes an antihistamine 30 minutes before completing the task, and group two who doesn't take an antihistamine at all. Which of the following is the correct null hypothesis to use in this scenario?
H0: x2 = X1
Which of the following is NOT a typical research hypothesis ?
H1: x1 = X2
Given the following hypotheses, which statement is correct?
H1: x1 = X2
H1: x1 / X2
A two-tailed test of the hypothesis should be used.
If a researcher divides a pool of 90 participants into a control group and a treatment group, our samples are considered ________.
independent
The following is an example of what type of hypotheses?
Null Hypothesis
To compare the mean scores from one English class to another English class, we need to use a(n) ________.
independent samples t-test
Volunteers signed up for a new weight loss program. They were weighed at the start, and then after one month on the program they were weighed again to determine if their weights were lower. This study would involve testing what type of samples?
dependent samples
When using an independent samples t-test, the ________.
research hypothesis is not directly tested
The degrees of freedom for an independent samples t-test with 20 participants in each group would be equal to ________.
38
The correct degrees of freedom for a dependent samples t-test with 55 participants in each group would be equal to ________.
54
When two samples are linked on a case-by-case basis, our samples are ________.
dependent
A researcher performed an experiment that examined wine drinking behavior in men and women. Each male was matched with a woman of similar age and health status. To analyze the data, a ______ would be performed.
two-sample t-test with dependent samples
A researcher wishes to study the effects of antihistamines on motor skills in a sample drawn from allergy sufferers. For group one, no antihistamines are given, while in group two, a typical dose of a popular antihistamine is given. After 30 minutes, both groups are asked to complete several activities that require fine motor skills, and each participant is timed to see how long it takes to complete the activities. The mean times for each group are tested to determine if they are significantly different from one another. Which of the following tests would the researcher need to use to determine if there are significant differences between the two groups?
Two sample t-test with independent samples
Volunteers signed up for a new weight loss program. They were weighed at the start, and then after one month on the program. The researcher wanted to determine if the volunteers' weights were significantly lower after one month on the program. In this study, the researcher is dealing with ________ samples.
Dependent
A researcher wished to study the effects of alcohol on the time required to complete a motor task. Participants were timed completing the task. They were then given one alcoholic beverage to consume. Twenty minutes later their performance on the task was again timed. The researcher compared the two means to determine if they were significantly different from one another. In this study, the researcher is dealing with ________ samples.
dependent
A researcher wishes to determine if students who study for 30 minutes each day for one week will earn higher test scores than students who study for the same total amount of time (210 minutes) on the day before the test. To test this research statement, which of the following hypothesis tests should the researcher use?
independent samples t-test
The following is an example of what type of hypotheses?
h0: x1=x2
null hypothesis
The following is an example of what type of hypotheses?
h1: x1 / x2
A two-tailed research hypothesis.
The null hypothesis states that ________.
There is no difference between the parameters of the two populations.
Hypothesis testing with two independent samples allows us to answer which of the following questions?
Are the two means different from one another due to chance?
If tobt = 2.01 and tcrit = 2.060 at the .05 significance level, what conclusions can be drawn?
We fail to reject the null hypothesis; our group means are not significantly different from one another.
The correct degrees of freedom formula for a t-test for independent samples is ________.
n1+n2-2
Hypothesis testing with two dependent samples allows us to answer which of the following questions?
Are the two means different from one another due to chance?
With a two-tailed hypothesis test with a dependent samples t-test, if tobt = 2.421 and tcrit = 2.447 at the .05 significance level, what conclusions can be drawn?
We fail to reject the null hypothesis; our group means are not significantly different from one another.
With a two-tailed hypothesis test with a dependent samples t-test, if tobt = -4.61 and tcrit = 2.571 at the .05 significance level, what conclusions can be drawn?
We reject the null hypothesis; our group means are significantly different from one another.
A researcher wants to know if sports positively influence self-esteem scores in middle school aged girls. He identifies a sample of 13 year old girls who do not participate in sports, and administers a self-esteem scale. The girls then choose a sport in which to participate. The researcher administers the same self-esteem scale again to the girls after two months of playing a sport. Identify the appropriate hypothesis test required for this research scenario.
two-sample t-test with dependent samples
Analysis of Variance can be used for hypothesis testing in all but which of the following situations?
to compare a group's standardized test score to the norm
The F ratio ________.
can be any positive number
The ratio of the variance between groups to the variance within groups is referred to as the ________.
ANOVA ratio
FALSE:
Tukey's HSD is used to determine ________.
which means are significantly different from one another
In order to fail to reject the null hypothesis in ANOVA, which of the following must be true?
Fobt â‰¥ Fcrit
Q can ________.
FALSE be any positive or negative number
Given that Fobt = 4.98 and Fcrit = 5.79, which of the following statements must be true?
We fail to reject the H0; there are no significant mean differences between our groups.
If Fobt = 4.52 and Fcrit = 4.88, what would be the next step in the ANOVA process?
We fail to reject the H0; there are no next steps.
In ANOVA, the null hypothesis assumes that ________.
no means are equal to one another
Calculate the F ratio given the following information:
MSB = 25.32
MSW = 14.21
dfB = 4
dfW = 95
k = 5
nTOTAL = 100
1.78
The distribution of F ________.
is restricted; it can be any positive number
Given that Fobt= 37.91 and Fcrit= 15.98, which of the following statements is an accurate interpretation?
We reject the H0; there are significant mean differences somewhere between our groups.
With ANOVA, we are likely to see significant differences in means among our groups if we see ________.
more variation between the groups than within the groups
Given that Fobt = 7.23 and Fcrit = 5.79, which of the following statements is an accurate interpretation?
We reject the H0; there are significant mean differences somewhere between our groups.
In ANOVA, the research hypothesis assumes that ________.
at least one mean differs from another mean
In ANOVA, the null hypothesis assumes that ________.
all means are equal to one another
In order to reject the null hypothesis in ANOVA, which of the following must be true?
Fobt â‰¥ Fcrit
Calculate the F ratio given the following information.
2.87
Given the following information in the ANOVA Summary Table, calculate the value of dfB.
3
The ________ is calculated by adding together all of the individual observations across all groups and dividing by the total number of observations.
grand mean
If Fobt =5.97 and Fcrit =4.88, what would be the next step in the ANOVA process?
We reject the H0; we now complete Tukey's HSD.
Tukey's HSD enables us to determine ________.
which group means are significantly different from one another
If Qobt= 3.90 and Qcrit = 4.04, what conclusion can we draw?
There is not a significant difference between the two means used to calculate Q.
Given the following ANOVA Summary Table, calculate the value of the Q statistic for two groups if the mean of group one equals 11.58 and the mean of group two equals 14.21 (there are three groups, with 15 in each group).
7.18
What is the range of the distribution of Q?
Q can be any positive number.
Which of the following is NOT a typical research hypothesis?
H1: x1 = X2
Susan was devastated because she failed to reject the null hypothesis during her science fair project. She had high hopes for her project, because the literature she read on the topic indicated that she should find significant results. If Susan should have rejected the null hypothesis rather than fail to reject it, it is possible that she inadvertently committed a ________ error.
Type II
Which of the following is NOT true for independent samples?
members may be subjected to pre- and post-testing to determine significant effects of a treatment
Which of the following is the appropriate null hypothesis for a one-sample t-test?
FALSE H0: x1 = X2
The ________ hypothesis claims that any differences seen between groups are due to chance rather than actual differences.
null
Which of the following is the appropriate null hypothesis for a dependent samples t-test?
H0: x1 = X2
Hypothesis testing uses ________ to test claims about the value of the ________.
sample statistics, population parameter
Determining which inferential statistical test to use in a research study depends on ________.
the specific hypotheses we are attempting to test
Jonathan completed his research study for his physiological psychology course. His results indicated that he should reject the null hypothesis. His instructor, however, said that Jonathan must have done something incorrectly because he actually should have failed to reject the null hypothesis. In this situation, it is likely that Jonathan inadvertently committed a ________ error.
Type I
The statement or claim about a population parameter is called a ________.
statistical hypothesis
Inferential statistical tests use ________ to make inferences about ________.
sample statistics, population parameters
A statement of what the researcher wants or expects to happen in her research study is the ________.
alternate hypothesis
ANOVA assumes which of the following?
that samples are independent random samples
Complete the appropriate test of the hypotheses given the data listed in the table below. Which of the following options is the most appropriate conclusion we can draw from the results?
Determine if there is a significant difference between completion rate times (in minutes) on a motor skills task for groups with differing blood alcohol levels (no alcohol or very little; some alcohol; moderate alcohol consumption)
Using ANOVA and Tukey's HSD, it was determined that there are significant differences between groups 1 and 3 and 2 and 3, but not between 1 and 2.
Which of the following is NOT a step in conducting a test of the hypothesis?
determine if the null hypothesis is one-tailed or two-tailed
If we wish to test a sample mean against a known population mean, which of the following inferential tests of the hypothesis would we use?
one-sample t-test
Dr. Sanga randomly selected 80 participants who suffer from allergies for a research study. He then divided the participants into two groups with 40 participants each. Participants in group 1 received a low dose of a new antihistamine; participants in group 2 received a placebo. Each group was then tested in a driving simulator to determine if the new antihistamine impaired motor and cognitive skills. Given the research scenario, which of the following inferential tests would Dr. Sanga use to statistically determine if the new antihistamine affects motor and cognitive skills?
independent samples t-test
A college instructor decided to test the success of her test review program. She compared the test scores of the students who attended the review to the test scores of the students who did not attend the review. She predicted that the students who attended the review would have significantly higher test scores than those who did not attend. Was her prediction correct?
No. Using an independent samples t-test, it was determined that the Review group performed about the same on the test as the Non-Review group.
Analysis of variance measures the amount of variation ________.
between groups and within groups
With a regression equation of Y' = .27 + 3.21 X, what is the predicted value if X = 27?
86.94
Given the following information, calculate the standard error of the estimate for a regression line. Round everything to two decimal places.
45.21
FALSE 20.46
A study found that socio-economic status was negatively correlated with the number of arrests. This means ________.
as socio-economic status increases, the number of arrests decrease
Regression equations allow us to use one variable to ________.
make predictions about the value of the other variable
We use ________ to graph relationships between variables.
scatterplots
What does the following scatter plot indicate about the relationship between the number of hours spent studying each week and the current cumulative academic grade point average?
there is a moderate positive relationship
What does the following scatter plot indicate about the relationship between air temperature and gas bill?
negative relationship
A study found that higher concentrations of aluminum in the brain were positively correlated with your chances of being diagnosed with Alzheimer's Disease. This means ________.
aluminum is related to but does not cause Alzheimer's Disease
A friend is working on his research paper for class. He tells you he's found the correlation between the number of hours spent watching television and grade point average is r= -2.3. This means:
The relationship between watching television and grade point average is still unknown, as this r value is impossible.
A researcher finds a significant correlation between height and the chances of becoming president of r = .6 and a significant correlation between age and becoming president of r = - .70. Of the two correlations, which relationship is the strongest?
The relationship between age and becoming president is the strongest.
Which of the following would result in a significant correlation?
robt > rcrit
The coefficient of determination gives an indication of ________.
the amount of variance in one variable that is predictable from variance in the other variable
If r = .56, then the coefficient of determination is equal to ________.
.31
If r = -.64, then the coefficient of determination is equal to ________.
.41
In working through your statistics homework, you found that the coefficient of determination is equal to .80. You conclude that ________.
you can predict 80% of the variance in one variable from the variance in the other variable
Mike's Auto Shop has more repairs when it's cold outside than when it's hot outside. Sam, the manager at the auto shop, tracked data over a year so he could calculate a regression equation to predict the number of jobs they will have based on the air temperature outside. He wants to use this prediction in order to better staff the garage during really busy times. Given the following information, determine which of the following is closest to the regression equation for prediction the number of repairs from the air temperature outside.
Mean air temperature: 62; standard deviation: 10.09
Mean number of repairs: 52; standard deviation: 9.10
r = - .91
Y' = 102.84 - .82X
Eddie receives a letter from the admissions committee from State University stating that his predicted first semester grade point average would be a 1.7. This means that he ________.
should be concerned but also should keep in mind that this prediction is only a best guess
Given the following information, calculate the standard error of the estimate for a regression line. Round everything to two decimal places.
174.21
5.38
Given the following information, calculate the standard error of the estimate for a regression line. Round everything to two decimal places.
14.98
.53
Standard deviation is to mean as ________ is to ________.
Standard error of the estimate, Y'
Expected frequency is the frequency we would expect to see in each cell if ________.
there existed no relationship between the variables
If chi-square is equal to 4.62, which of the following statements is true for a 4x5 contingency table at ?
chi-square obtained is less than the chi-square critical value, so there is not a significant relationship between the two variables
What is the expected frequency for the highlighted cell?
39
39.58
A research question that can be answered by using the chi-square test of independence is ________.
FALSE: Is there a relationship between the hours of television watched per week and one's annual salary?
FALSE Is there a relationship between the hours of television watched per week and one's annual salary?
Which of the following is NOT a research question that can be answered using the chi-square test of independence?
FALSE: Does sex relate to one's preference for dessert options?
Which of the following research questions could be answered using a chi-square test of independence?
FALSE: Does one's salary relate to the square foot of one's house?
If chi-square is equal to 114.83, which of the following statements is true for a 4x5 contingency table at ?
chi-square obtained is greater than chi-square critical value, so there is a significant relationship between the two variables
A contingency table contains all of the following EXCEPT ________.
relative frequencies
The chi-square test of independence is referred to as a(n) ________ test.
non-parametric
Chi-square test of independence can be used with which type of data?
categorical
Which of the following is an inaccurate statement with respect to the chi-square test of independence?
The chi-square test of independence is used to test relationships between interval and ratio data.
If we calculate a significant chi-square, which of the following is an appropriate conclusion that we can make?
one variable's relationship to the other variable is not due to chance
In a chi-square test of independence, the observed frequencies are recorded in a ________.
contingency table
The frequencies that we would expect to find in each cell if no relationship existed between the two variables is called the ________.
expected frequency
Which of the following formulas is the correct calculation for the expected frequency?
raw total times column total divided by n
If chi-square is equal to 16.24, which of the following statements is true for a 3x3 contingency table at ?
chi-square obtained is greater than chi-square critical value, so there is a significant relationship between the two variables
If chi-square is equal to 19.43, which of the following statements is an accurate interpretation for a 4x5 contingency table at Î±=.01?
chi-square obtained is less than the chi-square critical value, so there is not a significant relationship between the two variables
Which of the following could be a correct conclusion about ?
x2obt > x2crit , therefore is significant.
What is the expected frequency for the highlighted cell?
40
75
Given the following contingency table, calculate .
14.62
A chi-square test of independence can answer which of the following research questions?
Is there a relationship between students' pass rate from 5th grade to 6th grade and whether they take a multivitamin each day?
A chi-square test of independence can answer which of the following research questions?
Do remedial students pass the standardized reading test at a rate different from the non-remedial students?
Which of the following research questions could be answered using a chi-square test of independence?
Do more female students than male students sit in the front of the classroom?
If ________, t is NOT significant, which indicates that the group means are not significantly different from one another.
FALSE p >= .05
Significant mean differences are found when which of the following is true?
P <= .05
Calculating a regression equation with Excel returns all of the following EXCEPT ________.
t
Most of the statistical tests used in this course must be accessed through ________.
the Data Analysis Pack
In Excel, we use ________ to indicate that the cell from which certain information comes should NOT change when the formula is cut and pasted in another cell.
$
The Descriptive Statistics command returns all of the following EXCEPT the ________.
correlation
When we enter data into Excel, we often use text at the top of the column or at the far left of a row to indicate what data we are including. When we run certain tests in Excel, we must tell Excel that there is text in either the first row or the first cell of each column. Otherwise, Excel will try to treat these as numbers. We accomplish this task by which of the following?
by checking the appropriate Labels box within the statistical test we are running
Each Excel page is divided into ________, with letters at the top of each column and numbers along the left of each row.
cells
How can most of the statistical tests covered by this course be accessed and used in Excel?
by installing and selecting the correct test in the Data Analysis Pack
To test the relationship between two quantitative variables, which of the following options do you select from the Data Analysis menu?
correlation
Which of the following is NOT information that is needed for Excel to calculate a z score?
the range
To calculate measures of central tendency and measures of variability, which option do you select from the Data Analysis menu?
descriptive Statistics
Given the following descriptive statistics table, select the correct average test score for the students who were in the computer group.
83.5
Given the following output from Excel, identify the standard deviation for Overall GPA.
.55
When we compute a correlation using Excel, what information is NOT given to us by Excel?
the significance of the relationship between two variables
A researcher wanted to determine if there is a difference between males and females on GRE test scores. Which of the following hypothesis tests would the researcher need to perform to answer this question?
t-test: Two-Sample Assuming Equal Variances
A teacher wonders if her students taking the hybrid history course are performing as well as those taking the course face to face. She fears that the students in the hybrid course will score significantly worse than the students in the face to face class on the next exam. Once the exam is scored, the teacher performs a t-test to determine if the hybrid course test scores are significantly lower than the face to face course test scores. Using the t-table below, determine which of the following conclusions is correct.
p = .03; the hybrid mean is significantly lower than the face-to-face mean
A teacher needs evidence to present to the Dean of the College that her hybrid and online students are performing as well on tests as her traditional face to face students. She collects test scores from all three groups and uses an ANOVA to determine if she is correct. Interpret the results from the table, and choose the statement that correctly states the conclusion.
Since p = .18; there are no significant differences between the test scores of the three groups of students.
Given the following Regression Summary Table, the value that we use for b in the regression formula is ________, rounded to the nearest hundredths.
-.51
Given the following Summary Output Table for Regression from Excel, what would the regression equation be?
Y'= -3.80 + .08X
Which of the following is NOT an acceptable value for r?
FALSE: 0
and
-.99
The correct formula for calculating degrees of freedom for Pearson's Product Moment Correlation is ________.
df = n-2
Which of the following functions will calculate z scores in Excel?
FALSE: standardize from the Data Analysis Toolbox
Data that is categorical in nature can be analyzed by which of the following statistical tests?
chi-square test of independence
When a research situation uses categorical data, the special analysis techniques we must use to analyze the data are referred to as ________ tests.
non-parametric
Using hypothesis testing for correlation, rejecting the null hypothesis will indicate that a significant relationship between the two variables ________.
exists in the population
Which of the following values for r would indicate a strong, negative correlation?
- .90
Given a significant r = .68 for Variable A and Variable B, which of the following is the correct calculation and interpretation of the coefficient of determination?
The coefficient of determination is equal to .46, meaning that 46% of the variance in Variable A can be explained by the variance in Variable B.
Which of the following indicates the strongest relationship between the two variables?
- .79
Pearson's r is also referred to as the ________.
correlation coefficient
To quantitatively measure the relationship between two variables, a researcher should use which of the following statistical tests?
correlation
If a researcher finds a significant correlation between the level of education and annual salary of r = .91, which of the following assumptions can we make?
we can assume that a strong relationship exists between level of education and annual salary
A non-parametric test that allows us to draw conclusions about the relationship between two variables is the ________.
chi-square test of independence
Calculate chi-square. Choose the response option that MOST accurately interprets the results (assume a .05 level of significance).
x2 = 26 , there is a significant relationship between presidential candidate choice and education level
A significant chi-square indicates which of the following?
one variable is associated with another variable due to something other than chance
Which of the following is NOT required when calculating a value for chi-square?
degrees of freedom
The table used in a chi-square test of independence to record the observational data is called the ________ table.
contingency
Given the following table from Excel output when a regression analysis was run, determine the value of b in our regression formula (rounded to the nearest hundredth).
.02
Based on the measures of central tendency shown in this Descriptive Statistics table from Excel, would you assume that there are any extreme scores in our distribution?
possibly, since the mode = 50, which is much higher than the mean and the median
Given the following table from Excel output when a regression analysis was run, determine the value of a in our regression formula (rounded to the nearest hundredth).
.22
What can we assume from the following correlation matrix from Excel?
the correlation between age and distance walked is .40
Using the ANOVA table listed below, what should our conclusion be?
Since p < .05, we reject the null hypothesis; our means are significantly different from one another. We must now complete Tukey's HSD to determine which means differ significantly from one another.
Population
An entire group of study
Census
When we ask every member of the entire population to gain information about it
Parameter
Number used to describe the population (Ex. 96% of WLN students went to college last year; the parameter is the 96%).
Sample
Small section of population that we use to gain information about
Statistic
Number used to describe the sample (Ex. Out of 45 parents, 60% agree that children are a good idea; the statistic is the 60%)
A statistic describes a _______
Sample
A parameter describes a ________
Population
Observational Study
We observe individuals and measure variables without imposing treatment or attempting to influence the response in any way
Experiment
Impose a treatment, then measure and observe the response variables; we evaluate the effects of treatments imposed on experimental units
Survey (Sample Survey)
We ask questions to sample groups to make inferences about the population
What is another word for a Survey?
Sample Survey
When referring to Surveys, what does the term "inference" imply?
A guess as to what the population would be like if a census were done
What three topics must we think of before preparing a survey?
1). What population do we want to describe?
2). What are we measuring?
3). How will we measure it?
When conducting a survey, you'll most likely get ___________ (define)
Sampling variability - We're not likely to get the exact same, identical result with each sample
What can go wrong with sample surveys?
There will always be error due to sampling variability, so statistic will usually not be identical to the parameter.
Bias
Using a method that will consistently over/underestimate the wanted value. It favors an outcome. Whenever we have bias, we must elaborate whether we think we are overestimating or underestimating
Sampling Error
Errors from the way we pick a sample
Sampling Frame
The original LIST that we pull the sample from (ideally should be the population)
Give three examples of sampling frames
1). Yellow Pages
2). Classroom
3). Registered members of the Commerce Library Book Club
What are the types of sampling errors?
1). Convenience
2). Voluntary
3). Undercoverage
Convenience Sampling
When we only use people to which we have easy access to, so they usually have similar thoughts as we do
What is usually true about the experimental units of a convenience sample?
They usually have similar thoughts as the experimenter does
Voluntary Sampling
Self-selection; it allows people to choose to take part in the experiment; they put themselves in the sample. They usually have very strong opinions
What is usually true about the experimental units of a voluntary sample?
They typically have strong opinions, because they went out of their way to ensure that their voice was heard
Undercoverage
Leaving people out of the sampling frame
Give an example of Undercoverage
Using the yellow pages to call (excludes unlisted people, people without phones, prison inmates, homeless, people who only use cell phones, people in college)
Non-Sampling Error
An error that does not draw from the way we pick a sample (The sample was picked well, but other problems afterward caused bias)
Nonresponse Bias
People can't be reached or refuse to answer
Response Bias
Subjects give unreliable answers because they are uncomfortable telling the truth to the experimenter
Wording Bias
The wording of the question(s) the experimenter asks are confusing or leading; the order of questions skews the answers
SRS
Simple Random Sample; "Of size n consists of n individuals from population chosen in such a way that every set of n individuals has an equal chance to be in the sample"; every individual and every sample has an equal chance of being chosen; like pulling names from a hat
How do you create an SRS?
1). Label - Assign a number to every individual in the sample
2). Table - Randomly pick the sample (numbers in a hat, Random Digit Generator, Random Digit Table
Is an SRS always realistic?
No, so we use other methods
Stratified Random Sample
Samples are grouped by strata and we pull SRSs from each stratum
Strata
Similar groups (s. stratum)
Cluster Sample
Split population into groups and use SRS to pick an entire group and interview every member of that group
Give an example of a Cluster Sample
Use SRS to pick a grade and ask each member of that grade their opinion on taking Finals (each grade is a cluster)
Systematic Sample
Use some sort of fixed sampling method starting at a random point. If the first point is self-picked (not by chance) that is a self-selection sampling error; The first item is selected at random from the first k items in the frame, and then every kth term is included in the sample
How does Stratifying reduce sampling variability?
It used strata (it groups by similar characteristics)
Why do we sample?
To get an overall feel for the population (keep in mind, though, that the results most likely WILL NOT be exact)
Which give more accurate results - larger or smaller samples?
Larger samples
Why do you we sample instead of conduct a census?
It would be too expensive, take too long, and be too impractical to conduct a whole census every time we want information, so we use samples to infer about the population
Double-Blind Experiment
Neither the test subject not the experimenter know the treatment given, as to avoid bias
Statistically Significant
A response variable to great to be caused by chance
Matched Pairs Design
Experimental units are paired in two's based on a blocking variable and then treatments are randomly assigned between the two individuals in the pair
Blocking
Using groups of similar individuals in an experiment. Can help to avoid confounding variables
Stratifying is to sampling as ___________ is to experimenting
Blocking
Placebo Effect
Skewed results due to mental thought
Confounding
Indecision as to whether or not response variable y was manipulated by explanatory variable x or another explanatory/lurking variable z
Experimental Units
Non-human individuals involved in an experiment
Treatment
Something imposed on test subjects for evaluation; Combination of all factors and each level; What is done to units
Factor
Otherwise known as an Explanatory Variable, it is the cause of the outcome
What are the three Principles of Experimental Design?
1). Control - Control the environment to make conditions as similar as possible
2). Randomization - Make groups roughly equivalent by spreading out uncontrollable things
3). Replication - Use enough units in each group to show the same results (don't trust if only done once)
Completely Randomized Experiment
Treatments assigned to all experimental units by chance (tend to use a control group as the baseline of data)
How does utilizing randomization in samples differ from randomization in experiments?
Sample: Randomly pick people to be involved
Experiment: Pick people based on certain characteristics and randomly assign treatment (self-pick units and randomly give treatments)
How would a diagram to chart Experimental Research be Drawn?
Randomly Assign individuals into groups, each with different treatments, and compare the responses at the end
At the end of an experiment, you need to know how to write at the end:
â€¢ How you would assign groups
â€¢ How you would assign treatments
â€¢ Description of each treatment
â€¢ Always compare at the end
Blinding
Not telling my people who had what treatment (one or both parties don't know who has what treatment)
Single-Blind Experiment
One class (those who influence results and those who evaluate) doesn't know which group of individuals has which treatment (doesn't matter as to which class doesn't know)
The goal of an experiment is to show:
Cause/Effect (Causation)
What percent of an experiment's response determined by chance must be had in order for an experiment to be statistically significant?
<5%
The best experiments have these qualities:
â€¢ Randomized
â€¢ Results are compared
â€¢ Double-Blind
â€¢ Placebo-Controlled
True or False: Only an Experiment can show Causation
True; only an experiment can show causation (in order to show cause/effect, there must be an experiment at play)
What do you know about the Blinding of an Experiment with a Placebo?
It is at least Single-Blind (because the people don't know they're given a placebo), but can be Double-Blind
What can spread out lurking variables?
Randomization
Is Replication a good practice in experimental designs because it eliminates chance variation, or because it allows for chance variation to be estimated?
It allows for chance variation to be estimated
Does randomization make the treatment groups as similar as possible, make the treatment groups as different as possible, or reduce variability within treatment groups?
Makes the treatment groups as similar as possible
Keep in mind, randomization spreads out lurking variables, and by doing so, you are making the test groups more similar. This way, when causation is shown, the exact cause can be pinpointed
True or False: It is key that an experimenter randomly assigns treatments in an experiment. If not, then it is an Observational Study
True
Are different sample sizes a potential issue?
No
Which method: Blocking or Matched-Pairs will be the best guard against confounding variables?
Blocking because you want to separate the large characteristics (ex. gender) before you conduct an experiment, as those can prove to be lurking variables
If a sample survey is simply asking questions to sample groups without attempting to get a feel for the population those sample groups represent, is that still a sample survey?
No, it is an Observational Study
Does random mean haphazard?
No
Explain the quote: "Chance behavior is unpredictable in the short run, but has a predictable, regular pattern in the long run"
Things are random if individual outcomes are uncertain, but there is a regular distribution of outcomes in a large number of repetitions
Probability
Long-run relative frequency of events; proportion of times the outcome would occur in a very long series of repetitions
In order for an event to be random, it must be ______________
Independent
Law of Large Numbers
Long run frequency of repeated events gets closer and closer to the true probability as the number of trials increases
Law of Averages
BAD! STAY AWAY!
Thinking an outcome will happen ("it's due") based on the past
Sample Space
SET of all possible outcomes
{H, T} ----------> Sample Space of Flipping a Coin
Event
An outcome or set of outcomes of a random phenomenon; what we are looking for to happen
Probability Model
Mathematical description of a random phenomenon containing two parts:
+ Sample Space
+ Way of Assigning Probabilities to Outcomes
P(A)
Probability of Event A
Multiplication Principle
If you can do one event n number of ways, and another m number of ways, then you can do both in mn number of ways
What are the five rules of Probability?
1). The P(an event) is always between 0 or 1. It either never happens, always happens, or falls in the middle
2). P(S) = 1
(The probability of any outcome in a sample space occurring is always equal to 1)
3). The complement of A (or A', sometimes seen as A with a superscript c) is the probability that A doesn't occur and can be found by:
P(A') = 1 - P(A)
4). Addition Rule (of Disjoint Events) - If two events A and B are disjoint, then the union of the two events P(A U B) = the addition of both of those
P(A U B) = P(A) + P(B)
5). Multiplication Rule (of Independent Events) - If two events A and B are independent, then the intersection of the two events P(A n B) = the multiplication of both of those
P(A n B) = P(A)P(B)
The Union of two events uses addition or multiplication?
Addition
The Intersection of two events uses addition or multiplication?
Multiplication
Union means AND or OR?
Or
Intersection means AND or OR?
And
Independent
Knowledge of Event A gives no information about B
Disjoint
Events A and B can't happen at the same time
The Addition Rule is used for _____________ Events
Disjoint
The Multiplication Rule is used for _______________ Events
Independent
If A and B are independent, then so are ______________, but not ____________
A and A'
A' and B
A' and B'
NOT A and A'
Two-Way Table
A table that describes two categorical variables
Marginal Distribution
The total in the margins
Joint Frequency
Each entry in the table (main part, not the margins)
Conditional Distribution
Finding the Conditional Probability of a Two-Way Table:
Joint Frequency / Marginal Distribution
Is the General Addition Rule used exclusively for disjoint events? What is the General Addition Rule?
No! That is the Addition Rule!
The General Addition Rule states:
For any two events A and B, P(A U B) can be found by:
P(A U B) = P(A) + P(B) - P(A n B)
The last part could be ignored in disjoint cases because for disjoint events, P(A n B) = 0
Is the General Multiplication Rule used exclusively for independent events? What is the General Multiplication Rule?
No! That is the Multiplication Rule!
The General Multiplication Rule states:
For any two events A and B, P(A n B) can be found by:
P(A n B) = P(A)P(B I A)
OR
P(A n B) = P(B)P(A I B)
The last part can be ignored in independent cases because for independent events, the A should have no effect on B and vice versa. The P(B) should be equal to the P(B I A) because A doesn't affect B. This is the basis behind the equation to figure out if two events are independent or not
Conditional Probability
Probability that one event happens given another event
P(B I A) = P(A n B) / P(A)
We get this by modifying the General Multiplication Rule [Divide both sides by P(A) to isolate P(B I A)]
Can disjoint events ever be independent?
NO!
If we know that A and B are disjoint, then we know that A happening affects B happening in the sense that B cannot occur
What is the equation to find out if events are independent or not?
P(B I A) = P(B)
Because if A and B are independent, then A should have no affect on B
When do you use a Venn Diagram?
When two events share probabilities
When do you use a Tree Diagram?
When there are multiple steps involved
Pie Chart
Used with categorical data and it must add up to 100%
Very rarely used (only to compare relation as a whole)
For Categorical Data (I.e.: the colors of different socks)
Bar Graph
+ Used with Categorical Data
+ Usually Shows Frequency (Count)
+ Bars are free standing (Don't Touch)
+ Bars can be in any order
+ Used for categorical data (I.e.: the colors of different socks)
Quantitative Data
Data from surveys, experiments, and observational studies
What makes a Good Graph?
+ Title
+ Axes Labeled
+ Constant Scale
Dot Plot (How is One Made)?
+ Draw a Horizontal Line and Scale Based on the Numbers
+ Title the Dot Plot and Label the Axes
+ Place One Dot Above the Appropriate Value for each data point
What Four Attributes are Used to Interpret the Distribution of Quantitative Data
+ Shape
+ Center
+ Spread
+ Outliers or Not
What are the Four Shapes of a Quantitative Data Graph?
+ Bell
+ Uniform
+ Left - Skewed
+ Right - Skewed
Shape of a Bell Graph
Ummm... Bell -shaped?
Uniform Graph
Shaped Like a Table:
____________________
I I
(Except the lines touch)
Remember that "Uniform" refers to all the data being the same
Left - Skewed
The mean pulls the graph to the left
Right - Skewed
The mean pulls the graph to the right
What Two Concepts are Associated in Determining the Center of a Graph?
The median/mean
Spread of a Graph
How much variability from the first number to the last number
Written: Lowest to Highest Intervals (Ex. 4 - 23)
Range
A way of determining spread, how many numbers the data ranges:
Highest Value - Lowest Value = Range
(Ex. 23 - 4 = 19)
IS RESISTANT TO OUTLIERS
Outlier
Anything outside the norm of other points
Stemplot (Steam-and-Leaf Plot)
Quick picture or shape of distribution while including actual values
How to Create a Stemplot
1). Create a stem (all numbers in the tens place written vertically with a vertical line to the right)
2). Write a life in the appropriate row with increasing numbers of the ones place extending out
3). Include a key (and always include a title)
Splitting Stems
Take one stem and break it into two when too much data (refer to Assignment 31 - Chapter 1: Data Set #1)
What can Splitting the Stems help to identify?
If there are any outliers
When do we Split the Stem?
When there is too much data, or data is overcrowded
Back-to-Back Stemplot
Two sets of data share 1 stem (refer to Assignment #31 - Chapter 1: Data Set #1)
Histogram
Breaks values into classes/intervals of equal width and displays frequency/percent of each class. The bars touch because of the continuous interval.
If there is a large sample size, a Histogram is the best choice for displaying the data (if, of course, it is quantitative)
What is the Only Downside to a Histogram?
It only tells the frequency/percent, you can't see specific data points
When a Curve Appears to be a Bell, but is not Exact, What is the Shape Referred to As?
Approximately Symmetric (Refer to Assignment #31 - Chapter 1: Data Set #1)
Ogive
Also known as Relative Cumulative Frequency Graph, an Ogive is used when we need to know how one value compares to others.
An Ogive shows percents
In the Left-Most Set of a Back-to-Back Stemplot, where is the "Right Side" of the Data?
The bottom
So right-skewed would mean that most of the data is found near the top (AKA, the "Left Side")
Mean
Average Value
The fair share; the amount everyone would get if they all had the same amount; the balancing point
When referring to Quantitative Data Distribution Graphs, what does n mean?
n = the number of data entries
Mean of the Population
Mean of the Sample
Mean Formula
"The sum of observations / n"
Is Mean Resistant to Outliers?
No, because one number can make the mean much larger/smaller than it actually is
Median
Typical Value
Midpoint of distribution; half the observations are below this point and half are above
How to Find Median
+ Arrange all Numbers Smallest to Largest
+ In cases where n is odd, the median is the middle number
+ In cases where n is even, the median is the average of the two middle number
Is Median Resistant to Outliers?
Yes, because we are only interested in the positions of numbers, not their values
Mean, Median, and Mode Locations of a Symmetrical or Approximately Symmetrical Graph
Mean, Median, and Mode Locations of a Left - Skewed Graph
Mean, Median, and Mode Locations of a Right - Skewed Graph
When is Mean Used?
When a graph is symmetric or approximately symmetric (because it is affected by outliers)
When is Median Used?
When a graph is skewed or has outliers (because outliers don't affect it)
Spread
The amount of variability in the distribution
What Can We Use to Improve the Distribution of Spread?
The IQR (middle 50% of data)
Quartile 1
Q1
Median of the lower half of data
Quartile 3
Q3
Median of upper half of data
IQR
Q3 - Q1
Represents the middle 50% of data
Are IQR and Quartiles Resistant to Outliers?
Yes, because they are based on the median
Boundaries for Outliers
[Anything beyond these values are outliers]
Q1 - 1.5IQR
Q3 + 1.5IQR
If you know that an error occurred in your gathering of data, and in turn this creates an outlier, what can you do?
Drop the value from the data set
HOWEVER
If there is an outlier that wasn't caused by an error, then you MUST include it in the data set
Five Number Summary
Minimum, Quartile 1, Median, Quartile 3, and Maximum
Box Plot
Shows the Five Number Summary and Any Outliers
+ Box from Q1 to Q3
+ Vertical Line inside the Box is the Median
+ Horizontal Lines Extend to Minimum and Maximum (Not Including Outliers)
+ Outliers are Dots Outside of Lines
How to Make a Box Plot and Find the Five Number Summary with a Calculator (TI - 84)
+ Insert Data
+ 2nd - Stat Plot (Hit 2nd - y=)
+ Choose Plot 1
+ Turn on Plot 1
+ Go to the Fourth Option (Box Plot with the Dots)
+ Zoom 9
+ Trace to Jump Between Values of the Five Number Summary
Percentile
Measure of relative location
Finds what percent of observations lie at or below a certain value:
# of values at or below a certain number / # of values
Most Common Measure of Spread
Standard Deviation
Standard Deviation
Mean based; Looks at how far, on average, each observation is from the mean
Variance (in a Population)
Variance (in a Sample)
Standard Deviation (in a Population)
Standard Deviation (in a Sample)
When determining Standard Deviation (or Variance), we divide by n in the case of:
Populations
When determining Standard Deviation (or Variance), we divide by (n-1) in the case of:
Samples
When is the ONLY time we use Standard Deviation?
When the mean is used (Data is skewed or outliers are used)
What is always true about the value of Standard Deviation?
Sx â‰¥ 0
Sx = 0 when there is no variability in the data set at all (all the observations are the same)
When does Sx get larger?
The more spread out the data is
Why do we take the square root when finding Standard Deviation?
Sx has the same unit of measure as the original observations
How to Find Mean and Standard Deviation in the Calculator:
+ Put Data into a List
+ Stat. ---> Calc. ---> 1 Variable Statistics
+ (Data is in the List you Put it In)
Meanings of the Different Symbols You'll Encounter when Finding Sx in a Calculator
xÌ„ = Mean
âˆ‘x = Sum of x
âˆ‘x^2 = Square each data point, then find the sum
Sx = Standard Deviation of Sample (Divide by n-1)
Ïƒx = Standard Deviation of Population (Divide by n)
Can you ever cross between using Mean and IQR, or Median and Standard Deviation?
No!
Median and IQR are used for one distribution
Mean and Sx are used for another
How to Describe the Distribution of an Addition/Subtraction Error on a Calculator:
+ Put Data into a List
+ Go One List Over and Do all of the First List's Values Â± the Correct Number (For example, L2 - 13 will go into the L3 column)
How will Adding or Subtracting the same #a to each observation affect the distribution?
+ Add/Subtract #a to each measure of center (Mean, Median, Quartiles) and to each measure of location (Percentiles)
+ Shape is unaffected
+ Spread is unaffected
How to Describe the Distribution of an Multiplication/Division Error on a Calculator:
+ Put Data into a List (L1)
+ Go One List Over (L2) and Do all of L1's Values Multiplied by the Change (L1 * 23 information will go into L2)
How will Multiplying or Dividing the same #b to each observation affect the distribution?
+ Multiply/Divide #b to each measure of center/location (Mean, Median, Quartiles, Percentiles)
+ Multiply/Divide each measure of spread (IQR, Range, Sx) by |b|
+ Shape is never affected
Is a Percentile affected by both a change in the addition/subtraction of a distribution as well as a multiplication/division change?
Yes
Is a Measure of Spread affected by both a change in the addition/subtraction of a distribution as well as a multiplication/division change?
No, only a multiplication/division change
Is a Measure of Center affected by both a change in the addition/subtraction of a distribution as well as a multiplication/division change?
Yes
Is Shape affected by both a change in the addition/subtraction of a distribution as well as a multiplication/division change?
No, shape isn't affect by either
How would you go about doing x times 5%?
1.05x
Univariate Statistics
Single Variable Statistics (How is the Data Distributed?)
Bivariate Statistics
Two Variable Statistics (To Examine the Relationship Between the Two Variable)
Explanatory Variable
Explains or influences change in the response variable (explanatory variable is on the x - axis)
Response Variable
Measures the outcome of a study (on the y - axis)
Do We Always have Explanatory and Response Variables?
No
Scatterplot
Most useful graph for plotting two quantitative variable's measures on the SAME individuals
Each appears as a single point on the graph
Does Association Imply Causation?
No, only an experiment can show causation
How is a Scatterplot Interpreted?
Direction, Form, and Strength
What are the Three Different Measures of Direction in a Scatterplot?
Positive (aimed up), Negative (aimed down), None (no direction)
What are the Three Different Measures of Form in a Scatterplot?
Linear, Curved, and Clustered
What are the Three Different Measures of Strength in a Scatterplot?
Strong, Moderate, Weak
Strength
How close the points follow a clear form
Correlation Coefficient (r)
Measures the direction and strength of a linear relationship between two quantitative variables
What are the Seven Concepts of a Correlation Coefficient?
-1 â‰¤ r â‰¤ 1
r > 0 if positive, r < 0 if negative
If r is close to 0, there is a weak linear relationship
If r = 1 or r = -1, there is a perfectly linear relationship
Makes no distinction between explanatory and response variables
r doesn't change if we change units, because...
The correlation coefficient has no units
Can the Correlation Coefficient be Used for a Scatterplot of Any Form?
No, only a linear form
Cautions about Calculating the Correlation Coefficient
+ Both variables MUST be quantitative
+ Only strength and direction of a linear form
+ 0 does not mean "no relationship", it means "no LINEAR relationship"
+ r is NOT RESISTANT to outliers (because mean and standard deviation are in the formula)
+ r cannot say that the explanatory variable caused the response variable, it only says there is SOME sort of relationship
Correlation
Measures direction and strength of a linear relationship
Regression Line (Line of Best Fit)
A line that describes how a response variable, y, changes with an explanatory variable, x
Regression Line Formula
Å· = a + bx
Å·
Predicted value of a response variable for a given value of x
a (in the Regression Formula)
The y-intercept when x = 0. Only important if values are close to 0
bx (in the Regression Formula)
Slope; the amount by which y is predicted to change when x is increased by 1 unit
Extrapolation
Use of a Regression Line for predicting values outside the whole interval of x - values. (If the x - values range from 10 - 20, you couldn't use a Regression Line outside of that range)
Residual
(Error) Difference between actual and predicted values
Residual = actual y - Å·
If a Residual is Negative, did the Line Over or Under Predict?
Over predicted
If Å· = 2, but y = 1, then:
Residual = -1
This means that the predicted value over predicted because Å· > y
What are the Seven Concepts of a Least - Squares Regression Line?
+ No line is perfect (in most cases) but aim to be as close as possible
+ Due to the fact that we will not pass every point and the distance is as small as possible, there will be error (residual)
+ The goal of our line is to make the vertical distances (residuals) from the line of mean residuals to the actual y as small as possible
+ Always passes through the point (mean of x, mean of y)
+ If all the residuals were added together, it would be equal to 0, because it passes through the balancing point (mean of x, mean of y). Remember that the mean is the balancing point
+ We need to square the residuals to find the sum
+ The goal of the Least - Squares Regression Line is to make the sum of the squares as small as possible
Slope Formula Given the Standard Deviation of x, Standard Deviation of y, and the r
b = r(Sy / Sx)
Y - Intercept Formula Given the Mean of y and the Slope (Found in Another Formula)
a = (mean of y) - b(mean of x)
How to Interpret the Slope when Finding Å·?
___________ is predicted to _____________ (increase/decrease based on whether or not b is positive or negative) by ____________ (|b|) for each additional _____________________ (x)
Residual Plot
A graphical tool to tell how well a model fits the data and if it is appropriate to be used
Should show no obvious pattern, shape, or direction and the residuals should be relatively small (line that comes close to most of the points)
How to Make a Residual Plot in a TI - 84 Calculator
Turn on a Scatterplot ---> 2nd Stat ---> 7 (Residual)
Standard Deviation of Residuals
Average distance residuals are from the regression line, the closer to 0, the better
Coefficient of Determination
r^2
A numerical value that tells how well a Least - Squares Regression Line does at predicting y-values. This value can be given as a decimal or as a percent
How do you Interpret the Coefficient of Determination?
_______% of the variation in [the response variable y] can be accounted by the linear relationship with [explanatory variable x]
When Given a Computer Output, How do you Find r and r^2?
r^2 is the number next to "R squared" NOT "R squared (adjusted)"
r is the square root of this number
When Given a Computer Output, How do you Find the Response Variable?
It is given at the top, listed as the "Dependent Variable"
When Given a Computer Output, How do you Find the Standard Devision of the Residuals?
It is the number equal to "s"
When Given a Computer Output, How do you Find the whole Regression Line Formula?
Å· is the "Dependent Variable" hat
a is the first number listed under "Coefficient" (next to "Constant"
b is the second number listed under "Coefficient" (next to the explanatory variable)
x is the second number listed under "Variable" (under constant)
Outlier in a Scatterplot
Still lies outside of the overall pattern; points that are outliers in the y direction but not the x have large residuals
Influential Points
Points that are outliers in the direction of the x, but not the y (removing it would greatly change the Least - Squares Regression Line)
What Happens to the y-Intercept and the Slope of a Least - Squares Regression Line when an Influential Point is Present?
The y - intercept increases, because the slope decreases (think of it on a spring)
What are the Three Concepts you Use to Determine if a Relationship is Linear or Not?
+ Does the Scatterplot Look Linear?
+ Does the Residual Plot Look Random?
+ Does r^2 Indicate that a Linear Model is Appropriate? In Other Words, is it a High Percentage
When do we Re-Express Data?
When we want to represent the data, but we can't because it doesn't look linear
Exponential Model
Refers to a case in which we attempt to re-express non-linear data to interpret it correctly
An exponential model changes the original y vs. x scatterplot and makes it:
log(y) vs. x
Only the log(y) is taken
Power Model
Refers to a case in which we attempt to re-express non-linear data to interpret it correctly
An exponential model changes the original y vs. x scatterplot and makes it:
log(y) vs. log(x)
Both the log(y) and the log(x) are taken
What happens to the correlation coefficient if all of the points change in the same way?
It DOES NOT CHANGE!
If you add 90 to all the x-values, multiply by 56 to all the y-values, then switch the x- and y-values, r will stay the same!
Can points be spread differently if they have the same regression line?
Yes! Absolutely!
Addition Rule
P(A âˆª B) = P(A) + P(A) - P(A âˆ© B) aids in computing the chances of one of several events occurring at a given time.
Alpha (Î±)
The probability of a Type I error. See significance level.
Alternative Hypothesis
The hypothesis stating what the researcher is seeking evidence of. A statement of inequality. It can be written looking for the difference or change in one direction from the null hypothesis or both.
Association
Relationship between or among variables.
Back-Transform
The process by which values are substituted into a model of transformed data, and then reversing the transforming process to obtain the predicted value or model for nontransformed data.
Bar Chart
A graphical display used with categorical data, where frequencies for each category are shown in vertical bars.
Bell-Shaped
Often used to describe the normal distribution. See mound-shaped.
Beta (Î²)
The probability of a Type II error. See power.
Bias
The term for systematic deviation from the truth (parameter), caused by systematically favoring some outcomes over others.
Biased
A sampling method is biased if it tends to produce samples that do not represent the population.
Bimodal
A distribution with two clear peaks.
Binomial Distribution
The probability distribution of a binomial random variable.
Binomial Random Variable
A random variable x (a) that has a fixed number of trials of a random phenomenon n, (b) that has only two possible outcomes on each trial, (c) for which the probability of a success is constant for each trial, and (d) for which each trial is independent of other trials.
Bins
The intervals that define the "bars" of a histrogram.
Bivariate Data
Consists of two variables, an explanatory and a response variable, usually quantitative.
Blinding
Practice of denying knowledge to subjects about which treatment is imposed upon them.
Blocks
Subgroups of the experimental units that are separated by some characteristic before treatments are assigned because they may respond differently to the treatments.
Box-And-Whisker Plot/Boxplot
A graphical display of the five-number summary of a set of data, which also shows outliers.
Categorical Variable
A variable recorded as labels, names, or other non-numerical outcomes.
Census
A study that observes, or attempts to observe, every individual in a population.
Central Limit Theorem
As the size n of a simple random sample increases, the shape of the sampling distribution of xÌ„ tends toward being normally distributed.
Chance Device
A mechanism used to determine random outcomes.
Cluster Sample
A sample in which a simple random sample of heterogeneous subgroups of a population is selected.
Clusters
Heterogeneous subgroups of a population.
Coefficient of Determination (rÂ²)
Percent of variation in the response variable explained by its linear relationship with the explanatory variable.
Complement
The compliment of an event is that event not occurring.
Complementary Randomized Design
One in which all experimental units are assigned treatments solely by chance.
Conditional Distribution
See conditional frequencies.
Conditional Frequencies
Relative frequencies for each cell in a two-way table relative to one variable.
Conditional Probability
The probability of an event occurring given that another has occurred. The probability of A given that B has occurred is denoted as P(A|B).
Confidence Intervals
Give an estimated range that is likely to contain an unknown population parameter.
Confidence Level
The level of certainty that a population parameter exists in the calculated confidence interval.
Confounding
The situation where the effects of two or more explanatory variables on the response variable cannot be separated.
Confounding Variable
A variable whose effect on the response variable cannot be untangled from the effects of the treatment.
Contingency Table
See two-way table.
Continuous Random Variables
Those typically found by measuring, such as heights or temperatures.
Control Group
A baseline group that may be given no treatment, a faux treatment like a placebo, or an accepted treatment that is to be compared to another.
Control
The principle that potential sources of variation due to variables not under consideration must be reduced.
Convenience Sample
Composed of individuals who are easily accessed or contacted.
Correlation Coefficient (r)
A measure of the strength of a linear relationship,
r=(1/(n-1))Î£((xi-xÌ„)/sx)((yi-yÌ„)/sy).
Critical Value
The value that the test statistic must exceed in order to reject the null hypothesis. When computing a confidence interval, the value of t
(or z
) where Â±t
(or Â± z
z*) bounds the central C% of the t (or z) distribution.
Cumulative Frequency
The sums of the frequencies of the data values from smallest to largest.
Data Set
Collection of observations from a sample or population.
Dependent Events
Two events are called dependent when they are related and the fact that one event has occurred changes the probability that the second event occurs.
Discrete Random Variables
Those usually obtained by counting.
Disjoint Events
Events that cannot occur simultaneously.
Distribution
Frequencies of values in a data set.
Dotplot
A graphical display used with univariate data. Each data point is shown as a dot located above its numerical value on the horizontal axis.
Double-Blind
When both the subjects and data gatherers are ignorant about which treatment a subject received.
Empirical Rule (68-95-99.7) Rule
Gives benchmarks for understanding how probability is distributed under a normal curve. In the normal distribution, 68% of the observations are within one standard deviation of the mean, 95% is within two standard deviations of the mean, and 99.7% is within three standard deviations of the mean.
Estimation
The process of determining the value of a population parameter from a sample statistic.
Expected Value
The mean of a probability distribution.
Experiment
A study where the researcher deliberately influences individuals by imposing conditions and determining the individuals' responses to those conditions.
Experimental Units
Individuals (a person, a plot of land, a machine, or any single material unit) in an experiment.
Explanatory Variable
Explains the response variable, sometimes known as the treatment variable.
Exponential Model
A model of the form y = abË£.
Extrapolation
Using a model to predict values far outside the range of the explanatory variable, which is prone to creating unreasonable predictions.
Factors
One or more explanatory variables in an experiment.
First Quartile
Symbolized Q1, represents the median of the lower 50% of a data set.
Five-Number Summary
The minimum, first quartile (Q1), median, third quartile (Q3), and maximum values in a data set.
Frequency Table
A display organizing categorical or numerical data and how often each occurs.
Geometric Distribution
The probability distribution of a geometric random variable X. All possible outcomes of X before the first success is seen and their associated probabilities.
Geometric Random Variable
A random variable X (a) that has two possible outcomes of each trial, (b) for which the probability of a success is constant for each trial, and (c) for which each trial is independent of the other trials.
Graphical Display
A visual representation of a distribution.
Histogram
Used with univariate data, frequencies are shown on the vertical axis, and intervals or bins define the values on the horizontal axis.
Independent Events
Two events are called independent when knowing that one event has occurred does not change the probability that the second event occurs.
Independent Random Variables
If the values of one random variable have no association with the values of another, the two variables are called independent random variables.
Influential Point
An extreme value whose removal would drastically change the slope of the least-squares regression model.
Interquartile Range
Describes the spread of middle 50% of a data set, IQR = Q3 - Q1.
Joint Distribution
See joint frequencies.
Joint Frequencies
Frequencies for each cell in a two-way table relative to the total number of data.
Law of Large Numbers
The long-term relative frequency of an event gets closer to the true relative frequency as the number of traits of random phenomenon increases.
Least-Squares Regression Line (LSRL)
The "best-fit" line that is calculated by minimizing the sum of the squares of the differences between the observed and predicted values of the line. The LSRL has the equation Å· = bo + b1x.
levels
The different quantities or categories of a factor in an experiment.
Linear Regression
A method of finding the best model for a linear relationship between the explanatory and response variable.
Logarithmic Transformation
Procedure that changes a variable by taking the logarithm of each of its values.
Lurking Variable
A variable that has an effect on the outcome of a study but was not part of the investigation.
margin of Error
A range of values to the left and right of a point estimate.
Marginal Distribution
See marginal frequencies.
marginal Frequencies
Row totals and column totals in a two-way table.
Matched-Pairs Design
The design of a study where experimental units are naturally paired by a common characteristic, or with themselves in a before-after type of study.
Maximum
The largest numerical value in a data set.
Mean
The arithmetic average of a data set; the sum of all the values divided by the number of values, xÌ„ = (Î£xi)/n.
Mean of a Binomial Random Variable X
Î¼x = np.
Mean of a Discrete Random Variable
Î¼x = Î£ from i=1 to n of xiP(xi).
Mean of a Geometric Random Variable
Î¼x=1/p.
measures of Center
These locate the middle of a distribution. The mean and median are measures of center.
Median
The middle value of a data set; the equal areas point, where 50% of the data are at or below this value, and 50% of the data are at or above this value.
Minimum
The smallest numerical value in a data set.
Mound-Shaped
Resembles a hill or mount; a distribution that is symmetric and unimodal.
Multiplication Rule
P(A âˆ© B) = P(A) * P(B|A) is used when we are interested in teh probability of two events occurring simultaneously, or in succession.
Multistage Sample
A sample resulting from multiple applications of cluster, stratified, and/or simple random sampling.
Mutually Exclusive Events
See disjoint events.
Nonresponse Bias
The situation where an individual selected to be in the sample is unwilling, or unable, to provide data.
Normal Distribution
A continuous probability distribution that appears in many situations, both natural and man-made. It has a bell-shape and the area under the normal density curve is always equal to 1.
Null Hypothesis
The hypothesis of no difference, no change, and no association. A statement of equality, usually written in the form Ho: parameter = hypothesized value.
Observational Study
Attempts to determine relationships between variables, but the researcher imposes no conditions as in an experiment.
Observed Values
Actual outcomes or data from a study or an experiment.
One-Way Table
A frequency table of one variable.
Outlier
An extreme value in a data set. Quantified by being less than Q1 - 1.5
IQR or more than Q3 + 1.5
IRQ.
Percentiles
Divide the data set into 100 equal parts. An observation at the Pth percentile is higher tha P percent of all observations.
Placebo
A faux treatment given in an experiment that resembles the real treatment under consideration.
Placebo Effect
A phenomenon where subjects show a response to a treatment merely because the treatment is imposed regardless of its actual effect.
Point Estimate
An approximate value that has been calculated for the unknown parameter.
Population
The collection of all individuals under consideration in a study.
Population Parameter
A characteristic or measure of a population.
Position
Location of a data value relative to the population
Power
The probability of correctly rejecting the null hypothesis when it is in fact false. Equal to 1 - Î². See beta and Type II error.
Power Model
A function in the form of y - axáµ‡.
Predicted Value
The value of the response variable predicted by a model for a given explanatory variable.
Probability
Describes the chance that a certain outcome of a random phenomenon will occur.
Probability Distribution
A discrete random variable X is a function of all n possible outcomes of the random variable (xi) and their associated probabilities P(xi).
Probability Sample
Composed of individuals selected by chance.
P-Value
The probability of observing a test statistic as extreme as, or more extreme than, the statistic obtained from a sample, under the assumption that the null hypothesis is true.
Quantitative
A variable whose values are counts or measurements.
Random Digit Table
A chance device that is used to select experimental units or conduct simulations.
Random Phenomena
Those outcomes that are unpredictable in the short term, but nevertheless, have a long-term pattern.
Random Sample
A sample composed of individuals selected by chance.
Random Variables
Numerical outcome of a random phenomenon.
Randomization
The process by which treatments are assigned by a chance mechanism to the experimental units.
Randomized Block Design
First, units are sorted into subgroups or blocks, and then treatments are randomly assigned within the blocks.
Range
Calculated as the maximum value minus the minimum value in a data set.
Relative Frequency
Percentage or proportion of the whole number of data.
Replication
The practice of reducing chance variation by assigning each treatment to many experimental units.
Residual
Observed value minus predicted value of the response variable.
Response Bias
Because of the manner in which an interview is conducted, because of the phrasing of questions, or because of the attitude of the respondent, inaccurate data are collected.
Response Variable
Measures the outcomes that have been observed.
Sample
A selected subset of a population from which data are gathered.
Sample Statistic
Result of a sample used to estimate a parameter.
Sample Survey
A study that collects information from a sample of a population in order to determine one or more characteristics of the population.
Sampling Distribution
The probability distribution of a sample statistic when a sample is drawn from a population.
Sampling Distribution of the Sample Mean (xÌ„)
The distribution of sample means from all possible simple random samples of size n taken from a population.
Sampling Distribution of a Sample Proportion pÌ‚
The distribution of sample proportions from all possible simple random samples of size n taken from a population.
Sampling Error
See sampling variability.
Sampling Variability
Natural variability due to the sampling process. Each possible random sample from a population will generate a different sample statistic.
Scatterplots
Used to visualize bivariate data. The explanatory variable is shown on the horizontal axis and the response variable is shown on the vertical axis.
Significance Level
The probability of a Type I error. A benchmark against which the P-value compared to determine if the null hypothesis will be rejected. See also alpha.
Simple Random Sample (SRS)
A sample where n individuals are selected from a population in a way that every possible combination of n individuals is equally likely.
Simulation
A method of modeling chance behavior that accurately mimics the situation being considered.
Skewed
A unimodal asymmetric, distribution that tends to slant-most of the data are clustered on one side of the distribution and "tails" off on the other side.
Standard Deviation of a Binomial Random Variable X
Ïƒâ‚“=âˆš(np(1-p)).
Standard Deviation of a Discrete Random Variable X
Ïƒâ‚“=âˆš(ÏƒÂ²â‚“).
Standard Deviation
Used to measure variability of a data set. It is calculated as the square root of the variance of a set of data,
s = âˆš((Î£(xi-xÌ„)Â²/(n-1)).
Standard Error
An estimate of the standard deviation of the sampling distribution of a statistic.
Standard Normal Probabilities
The probabilities calculated from values of the standard normal distribution.
Standardized Score
The number of standard deviations an observation lies from the mean,
z = (observation - mean) / (standard deviation).
Statistically Significant
When a sample statistic is shown to be far from a hypothesized parameter. When the P-value is less than the significance level.
Stemplot
Also called a stem-and-leaf plot. Data are separated into a stem and leaf by place value and organized in the form of a histogram.
Strata
Subgroups of a population that are similar or homogeneous.
Stratification
Part of the sampling process where units of the study are separated into strata.
Stratified Random Sample
A sample in which simple random samples are selected from each of several homogeneous subgroups of the population, known as strata.
Subjects
individuals in an experiment that are people.
Symmetric
The distribution that resembles a mirror image on either side of the center.
Systematic Random Sample
A sample where every kth individual is selected from a list or queue.
Test Statistic
The number of standard deviations (standard errors) that a sample statistic lies from a hypothesized population parameter.
Third Quartile
Symbolized Q3, represents the median of the upper 50% of a data set.
Transformation
Changing the values of a data set using a mathematical operation.
Treatments
Combinations of different levels of the factors in an experiment.
Two-Way Table
A frequency table that displays two categorical variables.
Type I Error
Rejecting a null hypothesis when it is in fact true.
Type II Error
Failing to reject a null hypothesis when it is in fact false.
Undercoverage
When some individuals of a population are not included in the sampling process.
Uniform
All data values in the distribution have similar frequencies.
Unimodal
A distribution with a single, clearly defined, peak.
Univariate
One-variable data.
Variables
Characteristics of the individuals under study.
Variability
The spread in a data set.
Variance
Used to measure variability, the average of the squared deviations from the mean,
sÂ²â‚“ = âˆš((Î£(xi-xÌ„)Â²/(n-1)).
Variance of a Binomial Random Variable X
ÏƒÂ²â‚“ - np(1-p).
Variance of a Discrete Random Variable X
ÏƒÂ²â‚“ = Î£ from i=1 to n of (xi-Î¼â‚“)Â²Î¿P(xi).
Venn Diagram
Graphical representation of sets or outcomes and how they intersect.
Voluntary Response Bias
Bias due to the manner in which people choose to respond to voluntary surveys.
Voluntary Response Sample
Composed of individuals who choose to respond to a survey because of interest in the subject.
Z-Score
See standardized score.
YOU MIGHT ALSO LIKE...
Math 2283 Final
86 terms
Research Design and Statistical Analysis CHAPTER Fâ€¦
42 terms
statistics
71 terms
Statistics For Managers
137 terms
OTHER SETS BY THIS CREATOR
PEARSON chemistry chapter 9
29 terms
AP Statistics Summary
399 terms
AP Statistics Summary
174 terms
Chemistry Summary
630 terms