107 terms

Inference from a voluntary response sample can't be trusted.

(18.21)You turn your web browser to the online Harris Interactive Poll. Based on 2234 responses, the poll reports that 45% of U.S. adults believe global climate change exists and humans are the main cause, 30% believe global climate change exists but that its causes are mainly not related to humans, 13% do not believe global climate change exists, and 12% are undecided. You should refuse to calculate a 95% confidence interval for the proportion of all U.S. adults who do not believe global climate change exists based on this sample because

The average failure time of a large number of batteries has a distribution that is close to Normal.

(15.23) The number of hours a battery lasts before failing varies from battery to battery. The distribution of failure times follows an exponential distribution (see Example 15.8), which is strongly skewed to the right.

The distribution of burnout times is strongly skewed to the right. The central limit theorem says that

The distribution of burnout times is strongly skewed to the right. The central limit theorem says that

All customers who have purchased something in the last year.

(8.17) An online store contacts 1000 customers from its list of customers who have purchased something from them in the last year. In all, 696 of the 1000 say that they are very satisfied with the store's Web site. The population in this setting is

Z requires that you know the population standard deviation σ.

(20.17) We prefer the t procedures to the z procedures for inference about a population mean because

Confidence levels and P-values from the t procedures are quite accurate even if the population distribution is not exactly Normal.

(21.21) One major reason that the two-sample t procedures are widely used is that they are quite robust.

This means that

This means that

A matched pairs experiment.

(9.26) The Community Intervention Trial for Smoking Cessation asked whether a community-wide advertising campaign would reduce smoking. The researchers located 11 pairs of communities, each pair similar in location, size, economic status, and so on. One community in each pair participated in the advertising campaign and the other did not.

As you invest in more and more stocks chosen at random, your average return on these stocks gets ever closer to 11.7%.

(15.19) Annual returns on common small stocks available to investors vary a lot. In a recent year, the mean return was 11.7% and the standard deviation of returns was 34.1%. The law of large numbers says that

An observational study

A large representative random sample of 6906 U.S. adults collected over 20 years showed that "parents reported higher levels of life satisfaction than nonparents," with the observed difference in life satisfaction between the two groups being statistically significant.

This is an example of:

This is an example of:

The mean is less than the median.

(2.18) If a distribution is skewed to the left,

Undercoverage.

(8.24) A sample of households in a community is selected at random from the telephone directory. In this community, 4% of households have no telephone, 10% have only cell phones, and another 25% have unlisted telephone numbers. The sample will certainly suffer from

The probability that the test rejects H0 when μ = 0 is true.

(18.25) A medical experiment compared zinc supplements with a placebo for reducing the duration of colds. Let μ denote the mean decrease, in days, in the duration of a cold. A decrease to μ = 1 is a practically important decrease. The significance level of a test of H0: μ = 0 versus Ha: μ > 0 is defined as

Prediction of gas used from degree-days will be quite accurate.

(5.28) An owner of a home in the Midwest installed solar panels to reduce heating costs. After installing the solar panels, he measured the amount of natural gas used y (in cubic feet) to heat the home and outside temperature x (in degree days, where a day's degree-days are the number of degrees its average temperature falls below 65 degrees F) over a 23-month period. The software used to compute the least-squares regression line (ŷ = 85 + 16x) says that r^2= 0.98. This suggests that

The five-number summary.

To make a boxplot of a distribution, you must know

That the data can be thought of as a random sample from the population of interest.

(18.19) The most important condition for sound conclusions from statistical inference is usually

Statistic.

(15.17) The Bureau of Labor Statistics announces that last month it interviewed all members of the labor force in a sample of 60,000 households; 6.7% of the people interviewed were unemployed. The boldface number is a

Either a pie chart or a bar graph.

(1.14) To display the distribution of grades (A, B, C, D, F) in a course, it would be correct to use

The poll used a method that gets an answer within 4% of the truth about the population 95% of the time.

(22.24) A Gallup Poll in November 2012 found that 54% of the people in the sample said they wanted to lose weight. The poll's margin of error for 95% confidence was 4%. This means that

Is the reaction time.

(4.14) Researchers collect data on 5,134 American adults younger than 60. They measure the reaction times (in seconds) of each subject to a stimulus on a computer screen and how many years later the subject died.

The researchers are interested in whether reaction time can predict time to death (in years). When you make a scatterplot, the explanatory variable on the x axis

The researchers are interested in whether reaction time can predict time to death (in years). When you make a scatterplot, the explanatory variable on the x axis

In a test of significance, the P-value is

the probability, assuming the null hypothesis is true, that the test statistic will take a value at least as extreme as that actually observed.

The variability of a statistic is described by

the spread of its sampling distrbutions

Extrapolation

is the use of a regression line for predicition far outside the range of values of the explanatory variable you used to obtain the line.

Biased

The design of statistical study is___________ if it systematically favors certain outcomes.

Outlier

In a histogram, an idividual value that falls outside the overall pattern.

When drawing a histogram, it is important to

label the vertical axis so the reader can determine the counts or percent in each class interval

A random variable can be described as

a variable whose value is a numerical outcome of a random phenomenon.

Correlation

The _______ measures the direction and strength of the linear relationship between two quantitative variables.

Inference

Statisitical _________ provides methods for drawing conclusions about a population from sample data.

Distribution

The_______ of a variable tells us what values the variable takes and how often it takes these values.

Simple Random Sample

A________ of size n consists of n individuals from the population chosen is such a way that every set of n individuals has an equal chance to be the sample actually selected.

Categorical

places individuals into one of several groups or categories

Quantitative

takes numerical values for which arithmetic operations make sense. (usually recorded in a unit of measurment)

Histograms

show the distribution of a quantitative variable by using bars whose height represents the number of individuals who take on a value within a particular class.

Mean

The most common measure of center is the arithmetic average. (x-bar)

Median

Midpoint of a distribution, the number such the half of the observation are smaller and the other half are larger.

Density curve

a curve that is always on or above the horizontal axis

3 Properties of Normal Dist.

1. Single-peaked.

2. Bell-shaped.

3. Symmertic.

2. Bell-shaped.

3. Symmertic.

Response Variable

measures an outcome of a study (y).

Explanatory Variable

may explain or influence changes in a response variable (x).

Regression Line

a straight line that describes how a response variable, y, changes as explanatory variable, x, changes/

Slope (b)

amount by which y changes when x increases by 1 unit.

Y-interceot

value of y when x=0

Residuals

the difference between an observed value of response variable (y) and the value predicted by the regression line (y-hat)

Population

the entire group of individuals we want information from.

Sample

the part of the population from which we collect the information.

Sampling Design

explains exactly how to choose a sample from the population.

Observational Study

observes individuals and measures variables of interest but DOES NOT attempt to influence responses.

Experiment

deliberately imposews some treatment on individuals in order to observe the responses.

Confounded

two variables (explanatory or lurking) are confounded when their effects on a response variable cannot be distngushed from each other.

Probability

the mathematics of chance behavior.

Parameter

a value that describes the population. Typically these values are unknown.

Statistic

a value computed from sample data (not the population). Used to estimate the unknown parameter.

Statistical Inference

it provides methods for drawing conclusions about a population from sample data.

chi-square statistic

A measure of how far the observed counts in a two-way table are from the expected counts if H0 were true.

chi-square distributions

A family of distributions that take only positive values and are skewed to the right.

expected count

In any cell of a two-way table where H0 is true, this is (row total x column total)/table total.

chi-square test

For a two-way table, this tests the null hypothesis H0 that there is no relationship between the row variable and the column variable. The alternative hypothesis Ha says that there is some relationship but does not say what kind.

plus four confidence interval

To get a more accurate confidence interval, add four imaginary observations, two successes and two failures, to your sample. Then use the same formula for the large-sample confidence interval. Use this interval in practice when the confidence level is at least 90% and the sample size n is at least 10, with any counts of successes and failures.

large-sample confidence interval

Draw an SRS of size n from a large population that contains an unknown proportion p of successes. For a population proportion, the level C large-sample confidence interval is: Use this interval only when the number of successes and failures in the sample are both at least 15.

sample proportion

Denoted p̂, tests and confidence intervals for a population proportion p when the data are an SRS of size n, are based on this.

one-sample z statistic

The standardized sample mean.

matched pairs t procedures

To compare the responses to the two treatments in a matched pairs design, find the difference between the responses within each pair. Then apply the one-sample t procedures to these differences.

standard error

The result when the standard deviation of a statistic is estimated from data. Of the sample mean x̄, it is s/√n.

robust

Description of a confidence interval or significance test if the confidence level or P-value does not change very much when the conditions for use of the procedure are violated.

power

Measures the ability of a significance test to detect an alternative hypothesis. Against a specific alternative, this is the probability that the test will reject H0 at a chosen significance level α when the specified alternative value of the parameter is true.

Type I error

Type of error made when we reject the null hypothesis when in fact it is true.

significance level

Denoted by α, the probability of a Type I error of any fixed level test.

Type II error

Type of error made when we fail to reject the null hypothesis when the alternative hypothesis is true.

null hypothesis

Denoted by H0, the claim being tested by a statistical test. Usually this is a statement of "no effect" or "no difference"

alternative hypothesis

Denoted by Ha, the claim about a population that we are trying to find evidence for.

test of significance

A test to assess the evidence provided by data against a null hypothesis H0 in favor of an alternative hypothesis Ha.

one-sided

The alternative hypothesis is this if it states that a parameter is larger than or smaller than the null hypothesis value.

test statistic

Calculated from the sample data, measures how far the data diverge from what we would expect if the null hypothesis H0 were true. Large values of the statistic show that the data are not consistent with H0.

P-value

The probability, computed assuming that H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed. The smaller it is, the stronger the evidence against H0 provided by the data.

statistical significance

An observed effect so large that it would rarely occur by chance.

two-sided

The alternative hypothesis is this if it states that the parameter is different from the null hypothesis value (it could be either smaller or larger).

confidence interval

Uses sample data to estimate an unknown population parameter with an indication of how accurate the estimate is and of how confident we are that the result is correct. It has two parts: an interval calculated from the data and a confidence level. It often has the form: estimate ± margin of error.

statistical inference

Provides methods for drawing conclusions about a population from sample data.

confidence level

The success rate of the method that produces the confidence interval. Gives the probability that the interval will capture the true parameter value in repeated samples.

critical value

Chosen so that the standard Normal curve has area C between —z** and z**z*.

law of large numbers

Draw observations at random from any population with finite mean µ. As the number of observations drawn increases, the mean x̄ of the observed values tends to get closer and closer to the mean µ of the population.

statistic

A number that can be computed from the sample data without making use of any unknown parameters. In practice, we often use this to estimate an unknown parameter.

population distribution

The distribution of values of a variable among all the individuals in the population.

central limit theorem

States that for large n, the sampling distribution of the sample mean x̄ is approximately Normal for any population with mean µ and finite standard deviation σ. That is, averages are more Normal than individual observations.

sampling distribution

The distribution of values taken by a statistic in all possible samples of the same size from the same population.

probability distribution

Tells us what values a random variable X can take and how to assign probabilities to those values.

continuous probability model

Model that assigns probabilities as areas under a density curve. The area under the curve and above any range of values is the probability of an outcome in that range.

event

An outcome or set of outcomes of a random phenomenon. That is, it is a subset of the sample space.

probability model

A mathematical description of a random phenomenon consisting of two parts: a sample space S and a way of assigning probabilities to events.

finite probability model

A probability model with a finite sample space. To assigne probabilities in this model, list the probabilities of all the individual outcomes. These must be numbers between 0 and 1 that add to exactly 1. The probability of any event is the sum of the probabilities of the outcomes making up the event.

probability

The proportion of times an outcome of a random phenomenon would occur in a very long series of repetitions.

personal probability

A number between 0 and 1 that expresses an individual's judgment of how likely a particular outcome is.

discrete random variable

A random variable that has a finite list of possible outcomes.

continuous random variable

A random variable that can take on any value in an interval, with probabilities given as areas under a density curve.

random

Term given to a phenomenon if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions.

sample space

The set of all possible outcomes of a random phenomenon.

random variable

A variable whose value is a numerical outcome of a random phenomenon.

block

A group of individuals that are known before the experiment to be similar in some way that is expected to affect the response to the treatments.

block design

Design of an experiment where the random assignment of individuals to treatments is carried out separately within each block.

factors

The explanatory variables in an experime

subjects

The individuals studied in an experiment, particularly when they are people.

double-blind experiment

An experiment where neither the subjects nor the people who interact with them know which treatment each subject is receiving.

undercoverage

When some groups in the population are left out of the process of choosing the sample.

least-squares regression line

The line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.

68-95-99.7 rule

In the Normal distribution with mean μ and standard deviation σ, approximately 68% of the observations fall within σ of the mean μ, approximately 95% of the observations fall within 2σ of μ, and approximately 99.7% of the observations fall within 3σ of μ.

five-number summary

Consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation of a distribution, written in order from smallest to largest.

interquartile range

The distance between the first and third quartiles.

variance

An average of the squares of the deviations of a set of observations from their mean. Equal to the standard deviation squared.