Stats 145 Final Exam


Terms in this set (...)

Inference from a voluntary response sample can't be trusted.
(18.21)You turn your web browser to the online Harris Interactive Poll. Based on 2234 responses, the poll reports that 45% of U.S. adults believe global climate change exists and humans are the main cause, 30% believe global climate change exists but that its causes are mainly not related to humans, 13% do not believe global climate change exists, and 12% are undecided. You should refuse to calculate a 95% confidence interval for the proportion of all U.S. adults who do not believe global climate change exists based on this sample because
The average failure time of a large number of batteries has a distribution that is close to Normal.
(15.23) The number of hours a battery lasts before failing varies from battery to battery. The distribution of failure times follows an exponential distribution (see Example 15.8), which is strongly skewed to the right.
The distribution of burnout times is strongly skewed to the right. The central limit theorem says that
All customers who have purchased something in the last year.
(8.17) An online store contacts 1000 customers from its list of customers who have purchased something from them in the last year. In all, 696 of the 1000 say that they are very satisfied with the store's Web site. The population in this setting is
Z requires that you know the population standard deviation σ.
(20.17) We prefer the t procedures to the z procedures for inference about a population mean because
Confidence levels and P-values from the t procedures are quite accurate even if the population distribution is not exactly Normal.
(21.21) One major reason that the two-sample t procedures are widely used is that they are quite robust.
This means that
A matched pairs experiment.
(9.26) The Community Intervention Trial for Smoking Cessation asked whether a community-wide advertising campaign would reduce smoking. The researchers located 11 pairs of communities, each pair similar in location, size, economic status, and so on. One community in each pair participated in the advertising campaign and the other did not.
As you invest in more and more stocks chosen at random, your average return on these stocks gets ever closer to 11.7%.
(15.19) Annual returns on common small stocks available to investors vary a lot. In a recent year, the mean return was 11.7% and the standard deviation of returns was 34.1%. The law of large numbers says that
An observational study
A large representative random sample of 6906 U.S. adults collected over 20 years showed that "parents reported higher levels of life satisfaction than nonparents," with the observed difference in life satisfaction between the two groups being statistically significant.
This is an example of:
The mean is less than the median.
(2.18) If a distribution is skewed to the left,
(8.24) A sample of households in a community is selected at random from the telephone directory. In this community, 4% of households have no telephone, 10% have only cell phones, and another 25% have unlisted telephone numbers. The sample will certainly suffer from
The probability that the test rejects H0 when μ = 0 is true.
(18.25) A medical experiment compared zinc supplements with a placebo for reducing the duration of colds. Let μ denote the mean decrease, in days, in the duration of a cold. A decrease to μ = 1 is a practically important decrease. The significance level of a test of H0: μ = 0 versus Ha: μ > 0 is defined as
Prediction of gas used from degree-days will be quite accurate.
(5.28) An owner of a home in the Midwest installed solar panels to reduce heating costs. After installing the solar panels, he measured the amount of natural gas used y (in cubic feet) to heat the home and outside temperature x (in degree days, where a day's degree-days are the number of degrees its average temperature falls below 65 degrees F) over a 23-month period. The software used to compute the least-squares regression line (ŷ = 85 + 16x) says that r^2= 0.98. This suggests that
The five-number summary.
To make a boxplot of a distribution, you must know
That the data can be thought of as a random sample from the population of interest.
(18.19) The most important condition for sound conclusions from statistical inference is usually
(15.17) The Bureau of Labor Statistics announces that last month it interviewed all members of the labor force in a sample of 60,000 households; 6.7% of the people interviewed were unemployed. The boldface number is a
Either a pie chart or a bar graph.
(1.14) To display the distribution of grades (A, B, C, D, F) in a course, it would be correct to use
The poll used a method that gets an answer within 4% of the truth about the population 95% of the time.
(22.24) A Gallup Poll in November 2012 found that 54% of the people in the sample said they wanted to lose weight. The poll's margin of error for 95% confidence was 4%. This means that
Is the reaction time.
(4.14) Researchers collect data on 5,134 American adults younger than 60. They measure the reaction times (in seconds) of each subject to a stimulus on a computer screen and how many years later the subject died.

The researchers are interested in whether reaction time can predict time to death (in years). When you make a scatterplot, the explanatory variable on the x axis
In a test of significance, the P-value is
the probability, assuming the null hypothesis is true, that the test statistic will take a value at least as extreme as that actually observed.
The variability of a statistic is described by
the spread of its sampling distrbutions
is the use of a regression line for predicition far outside the range of values of the explanatory variable you used to obtain the line.
The design of statistical study is___________ if it systematically favors certain outcomes.
In a histogram, an idividual value that falls outside the overall pattern.
When drawing a histogram, it is important to
label the vertical axis so the reader can determine the counts or percent in each class interval
A random variable can be described as
a variable whose value is a numerical outcome of a random phenomenon.
The _______ measures the direction and strength of the linear relationship between two quantitative variables.
Statisitical _________ provides methods for drawing conclusions about a population from sample data.
The_______ of a variable tells us what values the variable takes and how often it takes these values.
Simple Random Sample
A________ of size n consists of n individuals from the population chosen is such a way that every set of n individuals has an equal chance to be the sample actually selected.
places individuals into one of several groups or categories
takes numerical values for which arithmetic operations make sense. (usually recorded in a unit of measurment)
show the distribution of a quantitative variable by using bars whose height represents the number of individuals who take on a value within a particular class.
The most common measure of center is the arithmetic average. (x-bar)
Midpoint of a distribution, the number such the half of the observation are smaller and the other half are larger.
Density curve
a curve that is always on or above the horizontal axis
3 Properties of Normal Dist.
1. Single-peaked.
2. Bell-shaped.
3. Symmertic.
Response Variable
measures an outcome of a study (y).
Explanatory Variable
may explain or influence changes in a response variable (x).
Regression Line
a straight line that describes how a response variable, y, changes as explanatory variable, x, changes/
Slope (b)
amount by which y changes when x increases by 1 unit.
value of y when x=0
the difference between an observed value of response variable (y) and the value predicted by the regression line (y-hat)
the entire group of individuals we want information from.
the part of the population from which we collect the information.
Sampling Design
explains exactly how to choose a sample from the population.
Observational Study
observes individuals and measures variables of interest but DOES NOT attempt to influence responses.
deliberately imposews some treatment on individuals in order to observe the responses.
two variables (explanatory or lurking) are confounded when their effects on a response variable cannot be distngushed from each other.
the mathematics of chance behavior.
a value that describes the population. Typically these values are unknown.
a value computed from sample data (not the population). Used to estimate the unknown parameter.
Statistical Inference
it provides methods for drawing conclusions about a population from sample data.
chi-square statistic
A measure of how far the observed counts in a two-way table are from the expected counts if H0 were true.
chi-square distributions
A family of distributions that take only positive values and are skewed to the right.
expected count
In any cell of a two-way table where H0 is true, this is (row total x column total)/table total.
chi-square test
For a two-way table, this tests the null hypothesis H0 that there is no relationship between the row variable and the column variable. The alternative hypothesis Ha says that there is some relationship but does not say what kind.
plus four confidence interval
To get a more accurate confidence interval, add four imaginary observations, two successes and two failures, to your sample. Then use the same formula for the large-sample confidence interval. Use this interval in practice when the confidence level is at least 90% and the sample size n is at least 10, with any counts of successes and failures.
large-sample confidence interval
Draw an SRS of size n from a large population that contains an unknown proportion p of successes. For a population proportion, the level C large-sample confidence interval is: Use this interval only when the number of successes and failures in the sample are both at least 15.
sample proportion
Denoted p̂, tests and confidence intervals for a population proportion p when the data are an SRS of size n, are based on this.
one-sample z statistic
The standardized sample mean.
matched pairs t procedures
To compare the responses to the two treatments in a matched pairs design, find the difference between the responses within each pair. Then apply the one-sample t procedures to these differences.
standard error
The result when the standard deviation of a statistic is estimated from data. Of the sample mean x̄, it is s/√n.
Description of a confidence interval or significance test if the confidence level or P-value does not change very much when the conditions for use of the procedure are violated.
Measures the ability of a significance test to detect an alternative hypothesis. Against a specific alternative, this is the probability that the test will reject H0 at a chosen significance level α when the specified alternative value of the parameter is true.
Type I error
Type of error made when we reject the null hypothesis when in fact it is true.
significance level
Denoted by α, the probability of a Type I error of any fixed level test.
Type II error
Type of error made when we fail to reject the null hypothesis when the alternative hypothesis is true.
null hypothesis
Denoted by H0, the claim being tested by a statistical test. Usually this is a statement of "no effect" or "no difference"
alternative hypothesis
Denoted by Ha, the claim about a population that we are trying to find evidence for.
test of significance
A test to assess the evidence provided by data against a null hypothesis H0 in favor of an alternative hypothesis Ha.
The alternative hypothesis is this if it states that a parameter is larger than or smaller than the null hypothesis value.
test statistic
Calculated from the sample data, measures how far the data diverge from what we would expect if the null hypothesis H0 were true. Large values of the statistic show that the data are not consistent with H0.
The probability, computed assuming that H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed. The smaller it is, the stronger the evidence against H0 provided by the data.
statistical significance
An observed effect so large that it would rarely occur by chance.
The alternative hypothesis is this if it states that the parameter is different from the null hypothesis value (it could be either smaller or larger).
confidence interval
Uses sample data to estimate an unknown population parameter with an indication of how accurate the estimate is and of how confident we are that the result is correct. It has two parts: an interval calculated from the data and a confidence level. It often has the form: estimate ± margin of error.
statistical inference
Provides methods for drawing conclusions about a population from sample data.
confidence level
The success rate of the method that produces the confidence interval. Gives the probability that the interval will capture the true parameter value in repeated samples.
critical value
Chosen so that the standard Normal curve has area C between —z and zz*.
law of large numbers
Draw observations at random from any population with finite mean µ. As the number of observations drawn increases, the mean x̄ of the observed values tends to get closer and closer to the mean µ of the population.
A number that can be computed from the sample data without making use of any unknown parameters. In practice, we often use this to estimate an unknown parameter.
population distribution
The distribution of values of a variable among all the individuals in the population.
central limit theorem
States that for large n, the sampling distribution of the sample mean x̄ is approximately Normal for any population with mean µ and finite standard deviation σ. That is, averages are more Normal than individual observations.
sampling distribution
The distribution of values taken by a statistic in all possible samples of the same size from the same population.
probability distribution
Tells us what values a random variable X can take and how to assign probabilities to those values.
continuous probability model
Model that assigns probabilities as areas under a density curve. The area under the curve and above any range of values is the probability of an outcome in that range.
An outcome or set of outcomes of a random phenomenon. That is, it is a subset of the sample space.
probability model
A mathematical description of a random phenomenon consisting of two parts: a sample space S and a way of assigning probabilities to events.
finite probability model
A probability model with a finite sample space. To assigne probabilities in this model, list the probabilities of all the individual outcomes. These must be numbers between 0 and 1 that add to exactly 1. The probability of any event is the sum of the probabilities of the outcomes making up the event.
The proportion of times an outcome of a random phenomenon would occur in a very long series of repetitions.
personal probability
A number between 0 and 1 that expresses an individual's judgment of how likely a particular outcome is.
discrete random variable
A random variable that has a finite list of possible outcomes.
continuous random variable
A random variable that can take on any value in an interval, with probabilities given as areas under a density curve.
Term given to a phenomenon if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions.
sample space
The set of all possible outcomes of a random phenomenon.
random variable
A variable whose value is a numerical outcome of a random phenomenon.
A group of individuals that are known before the experiment to be similar in some way that is expected to affect the response to the treatments.
block design
Design of an experiment where the random assignment of individuals to treatments is carried out separately within each block.
The explanatory variables in an experime
The individuals studied in an experiment, particularly when they are people.
double-blind experiment
An experiment where neither the subjects nor the people who interact with them know which treatment each subject is receiving.
When some groups in the population are left out of the process of choosing the sample.
least-squares regression line
The line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.
68-95-99.7 rule
In the Normal distribution with mean μ and standard deviation σ, approximately 68% of the observations fall within σ of the mean μ, approximately 95% of the observations fall within 2σ of μ, and approximately 99.7% of the observations fall within 3σ of μ.
five-number summary
Consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation of a distribution, written in order from smallest to largest.
interquartile range
The distance between the first and third quartiles.
An average of the squares of the deviations of a set of observations from their mean. Equal to the standard deviation squared.