174 terms

AP Statistics Review

Review for final AP Statistics Examination...

Terms in this set (...)

Addition Rule
P(A ∪ B) = P(A) + P(A) - P(A ∩ B) aids in computing the chances of one of several events occurring at a given time.
Alpha (α)
The probability of a Type I error. See significance level.
Alternative Hypothesis
The hypothesis stating what the researcher is seeking evidence of. A statement of inequality. It can be written looking for the difference or change in one direction from the null hypothesis or both.
Relationship between or among variables.
The process by which values are substituted into a model of transformed data, and then reversing the transforming process to obtain the predicted value or model for nontransformed data.
Bar Chart
A graphical display used with categorical data, where frequencies for each category are shown in vertical bars.
Often used to describe the normal distribution. See mound-shaped.
Beta (β)
The probability of a Type II error. See power.
The term for systematic deviation from the truth (parameter), caused by systematically favoring some outcomes over others.
A sampling method is biased if it tends to produce samples that do not represent the population.
A distribution with two clear peaks.
Binomial Distribution
The probability distribution of a binomial random variable.
Binomial Random Variable
A random variable x (a) that has a fixed number of trials of a random phenomenon n, (b) that has only two possible outcomes on each trial, (c) for which the probability of a success is constant for each trial, and (d) for which each trial is independent of other trials.
The intervals that define the "bars" of a histrogram.
Bivariate Data
Consists of two variables, an explanatory and a response variable, usually quantitative.
Practice of denying knowledge to subjects about which treatment is imposed upon them.
Subgroups of the experimental units that are separated by some characteristic before treatments are assigned because they may respond differently to the treatments.
Box-And-Whisker Plot/Boxplot
A graphical display of the five-number summary of a set of data, which also shows outliers.
Categorical Variable
A variable recorded as labels, names, or other non-numerical outcomes.
A study that observes, or attempts to observe, every individual in a population.
Central Limit Theorem
As the size n of a simple random sample increases, the shape of the sampling distribution of x̄ tends toward being normally distributed.
Chance Device
A mechanism used to determine random outcomes.
Cluster Sample
A sample in which a simple random sample of heterogeneous subgroups of a population is selected.
Heterogeneous subgroups of a population.
Coefficient of Determination (r²)
Percent of variation in the response variable explained by its linear relationship with the explanatory variable.
The compliment of an event is that event not occurring.
Complementary Randomized Design
One in which all experimental units are assigned treatments solely by chance.
Conditional Distribution
See conditional frequencies.
Conditional Frequencies
Relative frequencies for each cell in a two-way table relative to one variable.
Conditional Probability
The probability of an event occurring given that another has occurred. The probability of A given that B has occurred is denoted as P(A|B).
Confidence Intervals
Give an estimated range that is likely to contain an unknown population parameter.
Confidence Level
The level of certainty that a population parameter exists in the calculated confidence interval.
The situation where the effects of two or more explanatory variables on the response variable cannot be separated.
Confounding Variable
A variable whose effect on the response variable cannot be untangled from the effects of the treatment.
Contingency Table
See two-way table.
Continuous Random Variables
Those typically found by measuring, such as heights or temperatures.
Control Group
A baseline group that may be given no treatment, a faux treatment like a placebo, or an accepted treatment that is to be compared to another.
The principle that potential sources of variation due to variables not under consideration must be reduced.
Convenience Sample
Composed of individuals who are easily accessed or contacted.
Correlation Coefficient (r)
A measure of the strength of a linear relationship,
Critical Value
The value that the test statistic must exceed in order to reject the null hypothesis. When computing a confidence interval, the value of t (or z) where ±t (or ± zz*) bounds the central C% of the t (or z) distribution.
Cumulative Frequency
The sums of the frequencies of the data values from smallest to largest.
Data Set
Collection of observations from a sample or population.
Dependent Events
Two events are called dependent when they are related and the fact that one event has occurred changes the probability that the second event occurs.
Discrete Random Variables
Those usually obtained by counting.
Disjoint Events
Events that cannot occur simultaneously.
Frequencies of values in a data set.
A graphical display used with univariate data. Each data point is shown as a dot located above its numerical value on the horizontal axis.
When both the subjects and data gatherers are ignorant about which treatment a subject received.
Empirical Rule (68-95-99.7) Rule
Gives benchmarks for understanding how probability is distributed under a normal curve. In the normal distribution, 68% of the observations are within one standard deviation of the mean, 95% is within two standard deviations of the mean, and 99.7% is within three standard deviations of the mean.
The process of determining the value of a population parameter from a sample statistic.
Expected Value
The mean of a probability distribution.
A study where the researcher deliberately influences individuals by imposing conditions and determining the individuals' responses to those conditions.
Experimental Units
Individuals (a person, a plot of land, a machine, or any single material unit) in an experiment.
Explanatory Variable
Explains the response variable, sometimes known as the treatment variable.
Exponential Model
A model of the form y = abˣ.
Using a model to predict values far outside the range of the explanatory variable, which is prone to creating unreasonable predictions.
One or more explanatory variables in an experiment.
First Quartile
Symbolized Q1, represents the median of the lower 50% of a data set.
Five-Number Summary
The minimum, first quartile (Q1), median, third quartile (Q3), and maximum values in a data set.
Frequency Table
A display organizing categorical or numerical data and how often each occurs.
Geometric Distribution
The probability distribution of a geometric random variable X. All possible outcomes of X before the first success is seen and their associated probabilities.
Geometric Random Variable
A random variable X (a) that has two possible outcomes of each trial, (b) for which the probability of a success is constant for each trial, and (c) for which each trial is independent of the other trials.
Graphical Display
A visual representation of a distribution.
Used with univariate data, frequencies are shown on the vertical axis, and intervals or bins define the values on the horizontal axis.
Independent Events
Two events are called independent when knowing that one event has occurred does not change the probability that the second event occurs.
Independent Random Variables
If the values of one random variable have no association with the values of another, the two variables are called independent random variables.
Influential Point
An extreme value whose removal would drastically change the slope of the least-squares regression model.
Interquartile Range
Describes the spread of middle 50% of a data set, IQR = Q3 - Q1.
Joint Distribution
See joint frequencies.
Joint Frequencies
Frequencies for each cell in a two-way table relative to the total number of data.
Law of Large Numbers
The long-term relative frequency of an event gets closer to the true relative frequency as the number of traits of random phenomenon increases.
Least-Squares Regression Line (LSRL)
The "best-fit" line that is calculated by minimizing the sum of the squares of the differences between the observed and predicted values of the line. The LSRL has the equation ŷ = bo + b1x.
The different quantities or categories of a factor in an experiment.
Linear Regression
A method of finding the best model for a linear relationship between the explanatory and response variable.
Logarithmic Transformation
Procedure that changes a variable by taking the logarithm of each of its values.
Lurking Variable
A variable that has an effect on the outcome of a study but was not part of the investigation.
margin of Error
A range of values to the left and right of a point estimate.
Marginal Distribution
See marginal frequencies.
marginal Frequencies
Row totals and column totals in a two-way table.
Matched-Pairs Design
The design of a study where experimental units are naturally paired by a common characteristic, or with themselves in a before-after type of study.
The largest numerical value in a data set.
The arithmetic average of a data set; the sum of all the values divided by the number of values, x̄ = (Σxi)/n.
Mean of a Binomial Random Variable X
μx = np.
Mean of a Discrete Random Variable
μx = Σ from i=1 to n of xiP(xi).
Mean of a Geometric Random Variable
measures of Center
These locate the middle of a distribution. The mean and median are measures of center.
The middle value of a data set; the equal areas point, where 50% of the data are at or below this value, and 50% of the data are at or above this value.
The smallest numerical value in a data set.
Resembles a hill or mount; a distribution that is symmetric and unimodal.
Multiplication Rule
P(A ∩ B) = P(A) * P(B|A) is used when we are interested in teh probability of two events occurring simultaneously, or in succession.
Multistage Sample
A sample resulting from multiple applications of cluster, stratified, and/or simple random sampling.
Mutually Exclusive Events
See disjoint events.
Nonresponse Bias
The situation where an individual selected to be in the sample is unwilling, or unable, to provide data.
Normal Distribution
A continuous probability distribution that appears in many situations, both natural and man-made. It has a bell-shape and the area under the normal density curve is always equal to 1.
Null Hypothesis
The hypothesis of no difference, no change, and no association. A statement of equality, usually written in the form Ho: parameter = hypothesized value.
Observational Study
Attempts to determine relationships between variables, but the researcher imposes no conditions as in an experiment.
Observed Values
Actual outcomes or data from a study or an experiment.
One-Way Table
A frequency table of one variable.
An extreme value in a data set. Quantified by being less than Q1 - 1.5IQR or more than Q3 + 1.5IRQ.
Divide the data set into 100 equal parts. An observation at the Pth percentile is higher tha P percent of all observations.
A faux treatment given in an experiment that resembles the real treatment under consideration.
Placebo Effect
A phenomenon where subjects show a response to a treatment merely because the treatment is imposed regardless of its actual effect.
Point Estimate
An approximate value that has been calculated for the unknown parameter.
The collection of all individuals under consideration in a study.
Population Parameter
A characteristic or measure of a population.
Location of a data value relative to the population
The probability of correctly rejecting the null hypothesis when it is in fact false. Equal to 1 - β. See beta and Type II error.
Power Model
A function in the form of y - axᵇ.
Predicted Value
The value of the response variable predicted by a model for a given explanatory variable.
Describes the chance that a certain outcome of a random phenomenon will occur.
Probability Distribution
A discrete random variable X is a function of all n possible outcomes of the random variable (xi) and their associated probabilities P(xi).
Probability Sample
Composed of individuals selected by chance.
The probability of observing a test statistic as extreme as, or more extreme than, the statistic obtained from a sample, under the assumption that the null hypothesis is true.
A variable whose values are counts or measurements.
Random Digit Table
A chance device that is used to select experimental units or conduct simulations.
Random Phenomena
Those outcomes that are unpredictable in the short term, but nevertheless, have a long-term pattern.
Random Sample
A sample composed of individuals selected by chance.
Random Variables
Numerical outcome of a random phenomenon.
The process by which treatments are assigned by a chance mechanism to the experimental units.
Randomized Block Design
First, units are sorted into subgroups or blocks, and then treatments are randomly assigned within the blocks.
Calculated as the maximum value minus the minimum value in a data set.
Relative Frequency
Percentage or proportion of the whole number of data.
The practice of reducing chance variation by assigning each treatment to many experimental units.
Observed value minus predicted value of the response variable.
Response Bias
Because of the manner in which an interview is conducted, because of the phrasing of questions, or because of the attitude of the respondent, inaccurate data are collected.
Response Variable
Measures the outcomes that have been observed.
A selected subset of a population from which data are gathered.
Sample Statistic
Result of a sample used to estimate a parameter.
Sample Survey
A study that collects information from a sample of a population in order to determine one or more characteristics of the population.
Sampling Distribution
The probability distribution of a sample statistic when a sample is drawn from a population.
Sampling Distribution of the Sample Mean (x̄)
The distribution of sample means from all possible simple random samples of size n taken from a population.
Sampling Distribution of a Sample Proportion p̂
The distribution of sample proportions from all possible simple random samples of size n taken from a population.
Sampling Error
See sampling variability.
Sampling Variability
Natural variability due to the sampling process. Each possible random sample from a population will generate a different sample statistic.
Used to visualize bivariate data. The explanatory variable is shown on the horizontal axis and the response variable is shown on the vertical axis.
Significance Level
The probability of a Type I error. A benchmark against which the P-value compared to determine if the null hypothesis will be rejected. See also alpha.
Simple Random Sample (SRS)
A sample where n individuals are selected from a population in a way that every possible combination of n individuals is equally likely.
A method of modeling chance behavior that accurately mimics the situation being considered.
A unimodal asymmetric, distribution that tends to slant-most of the data are clustered on one side of the distribution and "tails" off on the other side.
Standard Deviation of a Binomial Random Variable X
Standard Deviation of a Discrete Random Variable X
Standard Deviation
Used to measure variability of a data set. It is calculated as the square root of the variance of a set of data,
s = √((Σ(xi-x̄)²/(n-1)).
Standard Error
An estimate of the standard deviation of the sampling distribution of a statistic.
Standard Normal Probabilities
The probabilities calculated from values of the standard normal distribution.
Standardized Score
The number of standard deviations an observation lies from the mean,
z = (observation - mean) / (standard deviation).
Statistically Significant
When a sample statistic is shown to be far from a hypothesized parameter. When the P-value is less than the significance level.
Also called a stem-and-leaf plot. Data are separated into a stem and leaf by place value and organized in the form of a histogram.
Subgroups of a population that are similar or homogeneous.
Part of the sampling process where units of the study are separated into strata.
Stratified Random Sample
A sample in which simple random samples are selected from each of several homogeneous subgroups of the population, known as strata.
individuals in an experiment that are people.
The distribution that resembles a mirror image on either side of the center.
Systematic Random Sample
A sample where every kth individual is selected from a list or queue.
Test Statistic
The number of standard deviations (standard errors) that a sample statistic lies from a hypothesized population parameter.
Third Quartile
Symbolized Q3, represents the median of the upper 50% of a data set.
Changing the values of a data set using a mathematical operation.
Combinations of different levels of the factors in an experiment.
Two-Way Table
A frequency table that displays two categorical variables.
Type I Error
Rejecting a null hypothesis when it is in fact true.
Type II Error
Failing to reject a null hypothesis when it is in fact false.
When some individuals of a population are not included in the sampling process.
All data values in the distribution have similar frequencies.
A distribution with a single, clearly defined, peak.
One-variable data.
Characteristics of the individuals under study.
The spread in a data set.
Used to measure variability, the average of the squared deviations from the mean,
s²ₓ = √((Σ(xi-x̄)²/(n-1)).
Variance of a Binomial Random Variable X
σ²ₓ - np(1-p).
Variance of a Discrete Random Variable X
σ²ₓ = Σ from i=1 to n of (xi-μₓ)²οP(xi).
Venn Diagram
Graphical representation of sets or outcomes and how they intersect.
Voluntary Response Bias
Bias due to the manner in which people choose to respond to voluntary surveys.
Voluntary Response Sample
Composed of individuals who choose to respond to a survey because of interest in the subject.
See standardized score.