AP Statistics Review
Review for final AP Statistics Examination...
Terms in this set (174)
P(A ∪ B) = P(A) + P(A) - P(A ∩ B) aids in computing the chances of one of several events occurring at a given time.
The probability of a Type I error. See significance level.
The hypothesis stating what the researcher is seeking evidence of. A statement of inequality. It can be written looking for the difference or change in one direction from the null hypothesis or both.
Relationship between or among variables.
The process by which values are substituted into a model of transformed data, and then reversing the transforming process to obtain the predicted value or model for nontransformed data.
A graphical display used with categorical data, where frequencies for each category are shown in vertical bars.
Often used to describe the normal distribution. See mound-shaped.
The probability of a Type II error. See power.
The term for systematic deviation from the truth (parameter), caused by systematically favoring some outcomes over others.
A sampling method is biased if it tends to produce samples that do not represent the population.
A distribution with two clear peaks.
The probability distribution of a binomial random variable.
Binomial Random Variable
A random variable x (a) that has a fixed number of trials of a random phenomenon n, (b) that has only two possible outcomes on each trial, (c) for which the probability of a success is constant for each trial, and (d) for which each trial is independent of other trials.
The intervals that define the "bars" of a histrogram.
Consists of two variables, an explanatory and a response variable, usually quantitative.
Practice of denying knowledge to subjects about which treatment is imposed upon them.
Subgroups of the experimental units that are separated by some characteristic before treatments are assigned because they may respond differently to the treatments.
A graphical display of the five-number summary of a set of data, which also shows outliers.
A variable recorded as labels, names, or other non-numerical outcomes.
A study that observes, or attempts to observe, every individual in a population.
Central Limit Theorem
As the size n of a simple random sample increases, the shape of the sampling distribution of x̄ tends toward being normally distributed.
A mechanism used to determine random outcomes.
A sample in which a simple random sample of heterogeneous subgroups of a population is selected.
Heterogeneous subgroups of a population.
Coefficient of Determination (r²)
Percent of variation in the response variable explained by its linear relationship with the explanatory variable.
The compliment of an event is that event not occurring.
Complementary Randomized Design
One in which all experimental units are assigned treatments solely by chance.
See conditional frequencies.
Relative frequencies for each cell in a two-way table relative to one variable.
The probability of an event occurring given that another has occurred. The probability of A given that B has occurred is denoted as P(A|B).
Give an estimated range that is likely to contain an unknown population parameter.
The level of certainty that a population parameter exists in the calculated confidence interval.
The situation where the effects of two or more explanatory variables on the response variable cannot be separated.
A variable whose effect on the response variable cannot be untangled from the effects of the treatment.
See two-way table.
Continuous Random Variables
Those typically found by measuring, such as heights or temperatures.
A baseline group that may be given no treatment, a faux treatment like a placebo, or an accepted treatment that is to be compared to another.
The principle that potential sources of variation due to variables not under consideration must be reduced.
Composed of individuals who are easily accessed or contacted.
Correlation Coefficient (r)
A measure of the strength of a linear relationship,
The value that the test statistic must exceed in order to reject the null hypothesis. When computing a confidence interval, the value of t
) where ±t
(or ± z
z*) bounds the central C% of the t (or z) distribution.
The sums of the frequencies of the data values from smallest to largest.
Collection of observations from a sample or population.
Two events are called dependent when they are related and the fact that one event has occurred changes the probability that the second event occurs.
Discrete Random Variables
Those usually obtained by counting.
Events that cannot occur simultaneously.
Frequencies of values in a data set.
A graphical display used with univariate data. Each data point is shown as a dot located above its numerical value on the horizontal axis.
When both the subjects and data gatherers are ignorant about which treatment a subject received.
Empirical Rule (68-95-99.7) Rule
Gives benchmarks for understanding how probability is distributed under a normal curve. In the normal distribution, 68% of the observations are within one standard deviation of the mean, 95% is within two standard deviations of the mean, and 99.7% is within three standard deviations of the mean.
The process of determining the value of a population parameter from a sample statistic.
The mean of a probability distribution.
A study where the researcher deliberately influences individuals by imposing conditions and determining the individuals' responses to those conditions.
Individuals (a person, a plot of land, a machine, or any single material unit) in an experiment.
Explains the response variable, sometimes known as the treatment variable.
A model of the form y = abˣ.
Using a model to predict values far outside the range of the explanatory variable, which is prone to creating unreasonable predictions.
One or more explanatory variables in an experiment.
Symbolized Q1, represents the median of the lower 50% of a data set.
The minimum, first quartile (Q1), median, third quartile (Q3), and maximum values in a data set.
A display organizing categorical or numerical data and how often each occurs.
The probability distribution of a geometric random variable X. All possible outcomes of X before the first success is seen and their associated probabilities.
Geometric Random Variable
A random variable X (a) that has two possible outcomes of each trial, (b) for which the probability of a success is constant for each trial, and (c) for which each trial is independent of the other trials.
A visual representation of a distribution.
Used with univariate data, frequencies are shown on the vertical axis, and intervals or bins define the values on the horizontal axis.
Two events are called independent when knowing that one event has occurred does not change the probability that the second event occurs.
Independent Random Variables
If the values of one random variable have no association with the values of another, the two variables are called independent random variables.
An extreme value whose removal would drastically change the slope of the least-squares regression model.
Describes the spread of middle 50% of a data set, IQR = Q3 - Q1.
See joint frequencies.
Frequencies for each cell in a two-way table relative to the total number of data.
Law of Large Numbers
The long-term relative frequency of an event gets closer to the true relative frequency as the number of traits of random phenomenon increases.
Least-Squares Regression Line (LSRL)
The "best-fit" line that is calculated by minimizing the sum of the squares of the differences between the observed and predicted values of the line. The LSRL has the equation ŷ = bo + b1x.
The different quantities or categories of a factor in an experiment.
A method of finding the best model for a linear relationship between the explanatory and response variable.
Procedure that changes a variable by taking the logarithm of each of its values.
A variable that has an effect on the outcome of a study but was not part of the investigation.
margin of Error
A range of values to the left and right of a point estimate.
See marginal frequencies.
Row totals and column totals in a two-way table.
The design of a study where experimental units are naturally paired by a common characteristic, or with themselves in a before-after type of study.
The largest numerical value in a data set.
The arithmetic average of a data set; the sum of all the values divided by the number of values, x̄ = (Σxi)/n.
Mean of a Binomial Random Variable X
μx = np.
Mean of a Discrete Random Variable
μx = Σ from i=1 to n of xiP(xi).
Mean of a Geometric Random Variable
measures of Center
These locate the middle of a distribution. The mean and median are measures of center.
The middle value of a data set; the equal areas point, where 50% of the data are at or below this value, and 50% of the data are at or above this value.
The smallest numerical value in a data set.
Resembles a hill or mount; a distribution that is symmetric and unimodal.
P(A ∩ B) = P(A) * P(B|A) is used when we are interested in teh probability of two events occurring simultaneously, or in succession.
A sample resulting from multiple applications of cluster, stratified, and/or simple random sampling.
Mutually Exclusive Events
See disjoint events.
The situation where an individual selected to be in the sample is unwilling, or unable, to provide data.
A continuous probability distribution that appears in many situations, both natural and man-made. It has a bell-shape and the area under the normal density curve is always equal to 1.
The hypothesis of no difference, no change, and no association. A statement of equality, usually written in the form Ho: parameter = hypothesized value.
Attempts to determine relationships between variables, but the researcher imposes no conditions as in an experiment.
Actual outcomes or data from a study or an experiment.
A frequency table of one variable.
An extreme value in a data set. Quantified by being less than Q1 - 1.5
IQR or more than Q3 + 1.5
Divide the data set into 100 equal parts. An observation at the Pth percentile is higher tha P percent of all observations.
A faux treatment given in an experiment that resembles the real treatment under consideration.
A phenomenon where subjects show a response to a treatment merely because the treatment is imposed regardless of its actual effect.
An approximate value that has been calculated for the unknown parameter.
The collection of all individuals under consideration in a study.
A characteristic or measure of a population.
Location of a data value relative to the population
The probability of correctly rejecting the null hypothesis when it is in fact false. Equal to 1 - β. See beta and Type II error.
A function in the form of y - axᵇ.
The value of the response variable predicted by a model for a given explanatory variable.
Describes the chance that a certain outcome of a random phenomenon will occur.
A discrete random variable X is a function of all n possible outcomes of the random variable (xi) and their associated probabilities P(xi).
Composed of individuals selected by chance.
The probability of observing a test statistic as extreme as, or more extreme than, the statistic obtained from a sample, under the assumption that the null hypothesis is true.
A variable whose values are counts or measurements.
Random Digit Table
A chance device that is used to select experimental units or conduct simulations.
Those outcomes that are unpredictable in the short term, but nevertheless, have a long-term pattern.
A sample composed of individuals selected by chance.
Numerical outcome of a random phenomenon.
The process by which treatments are assigned by a chance mechanism to the experimental units.
Randomized Block Design
First, units are sorted into subgroups or blocks, and then treatments are randomly assigned within the blocks.
Calculated as the maximum value minus the minimum value in a data set.
Percentage or proportion of the whole number of data.
The practice of reducing chance variation by assigning each treatment to many experimental units.
Observed value minus predicted value of the response variable.
Because of the manner in which an interview is conducted, because of the phrasing of questions, or because of the attitude of the respondent, inaccurate data are collected.
Measures the outcomes that have been observed.
A selected subset of a population from which data are gathered.
Result of a sample used to estimate a parameter.
A study that collects information from a sample of a population in order to determine one or more characteristics of the population.
The probability distribution of a sample statistic when a sample is drawn from a population.
Sampling Distribution of the Sample Mean (x̄)
The distribution of sample means from all possible simple random samples of size n taken from a population.
Sampling Distribution of a Sample Proportion p̂
The distribution of sample proportions from all possible simple random samples of size n taken from a population.
See sampling variability.
Natural variability due to the sampling process. Each possible random sample from a population will generate a different sample statistic.
Used to visualize bivariate data. The explanatory variable is shown on the horizontal axis and the response variable is shown on the vertical axis.
The probability of a Type I error. A benchmark against which the P-value compared to determine if the null hypothesis will be rejected. See also alpha.
Simple Random Sample (SRS)
A sample where n individuals are selected from a population in a way that every possible combination of n individuals is equally likely.
A method of modeling chance behavior that accurately mimics the situation being considered.
A unimodal asymmetric, distribution that tends to slant-most of the data are clustered on one side of the distribution and "tails" off on the other side.
Standard Deviation of a Binomial Random Variable X
Standard Deviation of a Discrete Random Variable X
Used to measure variability of a data set. It is calculated as the square root of the variance of a set of data,
s = √((Σ(xi-x̄)²/(n-1)).
An estimate of the standard deviation of the sampling distribution of a statistic.
Standard Normal Probabilities
The probabilities calculated from values of the standard normal distribution.
The number of standard deviations an observation lies from the mean,
z = (observation - mean) / (standard deviation).
When a sample statistic is shown to be far from a hypothesized parameter. When the P-value is less than the significance level.
Also called a stem-and-leaf plot. Data are separated into a stem and leaf by place value and organized in the form of a histogram.
Subgroups of a population that are similar or homogeneous.
Part of the sampling process where units of the study are separated into strata.
Stratified Random Sample
A sample in which simple random samples are selected from each of several homogeneous subgroups of the population, known as strata.
individuals in an experiment that are people.
The distribution that resembles a mirror image on either side of the center.
Systematic Random Sample
A sample where every kth individual is selected from a list or queue.
The number of standard deviations (standard errors) that a sample statistic lies from a hypothesized population parameter.
Symbolized Q3, represents the median of the upper 50% of a data set.
Changing the values of a data set using a mathematical operation.
Combinations of different levels of the factors in an experiment.
A frequency table that displays two categorical variables.
Type I Error
Rejecting a null hypothesis when it is in fact true.
Type II Error
Failing to reject a null hypothesis when it is in fact false.
When some individuals of a population are not included in the sampling process.
All data values in the distribution have similar frequencies.
A distribution with a single, clearly defined, peak.
Characteristics of the individuals under study.
The spread in a data set.
Used to measure variability, the average of the squared deviations from the mean,
s²ₓ = √((Σ(xi-x̄)²/(n-1)).
Variance of a Binomial Random Variable X
σ²ₓ - np(1-p).
Variance of a Discrete Random Variable X
σ²ₓ = Σ from i=1 to n of (xi-μₓ)²οP(xi).
Graphical representation of sets or outcomes and how they intersect.
Voluntary Response Bias
Bias due to the manner in which people choose to respond to voluntary surveys.
Voluntary Response Sample
Composed of individuals who choose to respond to a survey because of interest in the subject.
See standardized score.