Search
Create
Log in
Sign up
Log in
Sign up
AP Statistics Definitions
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (168)
addition rule
P(A∪B)=P(A)+P(A)-P(A∩B), aids in computing the chances of one of several events occurring at a given time
alpha (α)
the probability of a Type I error
alternative hypothesis
the hypothesis stating what the researcher is seeking evidence of, a statement of inequality, can be written looking for the difference or change in one direction from the null hypothesis or both
association
relationship between or among variables
back-transform
the process by which values are substituted into a model of transformed data and then reversing the transforming process to obtain the predicted value or model for nontransformed data
bar chart
a graphical display used with categorical data, where frequencies for each category are shown in vertical bars
bell-shaped
often used to describe the normal distribution
beta (β)
the probability of a Type II error
bias
the term for systematic deviation from the truth (parameter) caused by systematically favoring some outcomes over others
biased
a sampling method is biased if it tends to produce samples that do not represent the population
bimodal
a distribution with two clear peaks
binomial distribution
the probability distribution of a binomial random variable
binomial random variable
a random variable X (a) that has a fixed number of trials of a random phenomenon n, (b) that has only two possible outcomes on each trial, (c)for which the probability of a success is constant for each trial, and (d) for which each trial is independent of other trials
bins
the intervals that define the "bars" of a histogram
bivariate data
consists of two variables, an explanatory and a response variable, usually quantitative
blinding
practice of denying knowledge to subjects about which treatment is imposed upon them
blocks
subgroups of the experimental units that are separated by some characteristic before treatments are assigned because they may respond differently to the treatments
box-and-whisker plot/boxplot
a graphical display of the five-number summary of a set of data, which also shows outliers
categorical variables
a variable recorded as labels, names, or other non-numerical outcomes
census
a study that observes, or attempts to observe, every individual in a population
central limit theorem
as the size n of a simple random sample increases, the shape of the sampling distribution of x̄ tends toward being normally distributed
chance device
a mechanism used to determine random outcomes
cluster sample
a sample in which a simple random sample of heterogeneous subgroups of a population is selected
clusters
heterogeneous subgroups of a population
coefficient of determination (r²)
percent of variation in the response variable explained by its linear relationship with the explanatory variable
complement
the complement of an event is that event not occurring
complimentary events
two events whose probability add up to 1
completely randomized design
one in which all experimental units are assigned treatments solely by chance
conditional frequencies
relative frequencies for each cell in a two-way table relative to one variable
conditional probability
the probability of an event occurring given that another has occurred, the probability of A given that B has occurred is denoted as P(A|B)
confidence intervals
give an estimated range that is likely to contain an unknown population parameter
confidence level
the level of certainty that a population parameter exists in the calculated confidence interval
confounding
the situation where the effects of two or more explanatory variables on the response variable cannot be separated
confounding variable
a variable whose effect on the response variable cannot be untangled from the effects of the treatment
continuous random variables
those typically found by measuring, such as heights or temperatures
control group
a baseline group that may be given no treatment, a faux treatment like a placebo, or an accepted treatment that is to be compared to another
control
the principle that potential sources of variation due to variables not under consideration must be reduced
convenience sample
composed of individuals who are easily accessed or contacted
correlation coefficient (r)
a measure of the strength of a linear relationship
critical value
the value that the test statistic must exceed in order to reject the null hypothesis; when computing a confidence interval, the value of t
(or z
) where ±t
(or ±z
z*) bounds the central C% of the t (or z) distribution
cumulative frequency
the sums of the frequencies of the data values from smallest to largest
data set
collection of observations from a sample or population
dependent events
two events are called dependent when they are related and the fact that one event has occurred changes the probability that the second event occurs
discrete random variables
those usually obtained by counting
disjoint events
events that cannot occur simultaneously
distribution
frequencies of values in a data set
dotplot
a graphical display used with univariate data; each data point is shown as a dot located above its numerical value on the horizontal axis
double-blind
when both the subjects and data gatherers are ignorant about which treatment a subject received
empirical rule (68-95-99.7 rule)
gives benchmarks for understanding how probability is distributed under a normal curve; in the normal distribution, 68% of the observations are within one standard deviation of the mean, 95% is within two standard deviations of the mean, and 99.7% is within three standard deviations of the mean
estimation
the process of determining the value of a population parameter from a sample statistic
expected value
the mean of a probability distribution
experiment
a study where the researcher deliberately influences individuals by imposing conditions and determining the individuals' responses to those conditions
experimental units
individuals (a person, a plot of land, a machine, or any single material unit) in an experiment
explanatory variable
explains the response variable, sometimes known as the treatment variable
exponential model
a model of the form y=ab^x
extrapolation
using a model to predict values far outside the range of the explanatory variable, which is prone to creating unreasonable predictions
factors
one or more explanatory variables in an experiment
first quartile
symbolized Q₁, represents the median of the lower 50% of a data set
five-number summary
the minimum, first quartile (Q₁), median, third quartile (Q₃), and maximum values in a data set
frequency table
a display organizing categorical or numerical data and how often each occurs
geometric distribution
the probability distribution of a geometric random variable X, all possible outcomes of X before the first success is seen and their associated probabilities
geometric random variable
a random variable X (a) that has two possible outcomes of each trial, (b) for which the probability of a success is constant for each trial, and (c) for which each trial is independent of the other trials
graphical display
a visual representation of a distribution
histogram
used with univariate data, frequencies are shown on the vertical axis and intervals or bins define the values on the horizontal axis
independent events
two events are called independent when knowing that one event has occurred does not change the probability that the second event occurs
independent random variables
if the values of one random variable have no association with the values of another, the two variables are called independent random variables
influential point
an extreme value whose removal would drastically change the slope of the least-squares regression model
interquartile range
describes the spread of the middle 50% of a data set, IQR=Q₃-Q₁
joint frequencies
frequencies for each cell in a two-way table relative to the total number of data
law of large numbers
the long-term relative frequency of an event gets closer to the true relative frequency as the number of trials of random phenomenon increases
least-squares regression line (LSRL)
the "best-fit" line that is calculated by minimizing the sum of the squares of the differences between the observed and predicted values of the line
levels
the different quantities or categories of a factor in an experiment
linear regression
a method of finding the best model for a linear relationship between the explanatory and response variable
logarithmic transformation
procedure that changes a variable by taking the logarithm of each of its values
lurking variable
a variable that has an effect on the outcome of a study but was not part of the investigation
margin of error
a range of values to the left and right of a point estimate
marginal frequencies
row totals and column totals in a two-way table
matched-pairs design
the design of a study where experimental units are naturally paired by a common characteristic or with themselves in a before-after type of study
maximum
the largest numerical value in a data set
mean
the arithmetic average of a data set; the sum of all the values divided by the number of values
mean of a binomial random variable X
µx=np
mean of a discrete random variable
µx=∑xiP(xi)
mean of a geometric random variable
µx=1/p
measures of center
these locate the middle of a distribution; the mean and median are measures of center
median
the middle value of a data set; the equal areas point, where 50% of the data are at or below this value and 50% of the data are at or above this value
minimum
the smallest numerical value in a data set
mound-shaped
resembles a hill or mound; a distribution that is symmetric and unimodal
multiplication rule
P(A∩B)=P(A)×P(B|A) is used when we are interested in the probability of two events occurring simultaneously or in succession
multistage sample
a sample resulting from multiple applications of cluster, stratified, and/or simple random sampling
nonresponse bias
the situation where an individual selected to be in the sample is unwilling or unable to provide data
normal distribution
a continuous probability distribution that appears in many situations, both natural and man-made; has a bell-shape and the area under the normal density curve is always equal to 1
null hypothesis
the hypothesis of no difference, no change, and no association; a statement of equality usually written in the form H₀:parameter=hypothesized value
observational study
attempts to determine relationships between variables, but the researcher imposes no conditions as in an experiment
observed values
actual outcomes or data from a study or an experiment
one-way table
a frequency table of one variable
outlier
an extreme value in a data set, quantified by being less than Q₁-1.5IQR or more than Q₃+1.5IQR
percentiles
divide the data set into 100 equal parts; an observation at the Pth percentile is higher than P percent of all observations
placebo
a faux treatment given in an experiment that resembles the real treatment under consideration
placebo effect
a phenomenon where subjects show a response to a treatment merely because the treatment is imposed regardless of its actual effect
point estimate
an approximate value that has been calculated for the unknown parameter
population
the collection of all individuals under consideration in a study
population parameter
a characteristic or measure of a population
position
location of a data value relative to the population
power
the probability of correctly rejecting the null hypothesis when it is in fact false, equal to 1-β
power model
a function in the form y=ax^b
predicted value
the value of the given response variable predicted by a model for a given explanatory variable
probability
describes the chance that a certain outcome of a random phenomenon will occur
probability distribution
a discrete random variable X is a function of all n possible outcomes of the random variable (xi) and their associated probabilities P(xi)
P-value
the probability of observing a test statistic as extreme as, or more extreme than, the statistic obtained from a sample, under the assumption that the null hypothesis is true
quantitative
a variable whose values are counts or measurements
random digit table
a chance device that is used to select experimental units or conduct simulations
random phenomena
those outcomes that are unpredictable in the short term, but nevertheless have a long-term pattern
random sample
a sample composed of individuals selected by chance
random variables
numerical outcome of a random phenomenon
randomization
the process by which treatments are assigned by a chance mechanism to the experimental units
randomized block design
first units are sorted into subgroups or blocks, and then treatments are randomly assigned within the blocks
range
calculated as the maximum value minus the minimum value in a data set
relative frequency
percentage or proportion of the whole number of data
relative frequency segmented bar chart
a method of graphing a conditional distribution
replication
the practice of reducing chance variation by assigning each treatment to many experimental units
residual
observed value minus predicted value of the response variable
response bias
because of the manner in which an interview is conducted, because of the phrasing of questions, or because of the attitude of the respondent, inaccurate data are collected
response variable
measures the outcomes that have been observed
sample
a selected subset of a population from which data are gathered
sample statistic
a study that collects information from a sample of a population in order to determine one or more characteristics of the population
sample survey
a study that collects information from a sample of a population in order to determine one or more characteristics of the population
sampling distribution
the probability distribution of a sample statistic when a sample is drawn from a population
sampling distribution of the sample mean x̄
the distribution of sample means from all possible simple random samples of size n taken from a population
sampling distribution of a sample proportion p̂
the distribution of sample proportions from all possible simple random samples of size n taken from a population
sampling variability
natural variability due to the sampling process; each possible random sample from a population will generate a different sample statistic
scatterplots
used to visualize bivariate data; the explanatory variable is shown on the horizontal axis and the response variable variable is shown on the vertical axis
significance level
the probability of a Type I error; a benchmark against which the P-value compared to determine if the null hypothesis will be rejected
simple random sample (SRS)
a sample where n individuals are selected from a population in a way that every possible combination of n individuals is equally likely
simulation
a method of modeling chance behavior that accurately mimics the situation being considered
skewed
a unimodal, asymmetric distribution that tends to slant - most of the data are clustered on one side of the distribution and "tails" off on the other side
standard deviation of a binomial random variable X
σ=√np(1-p)
standard deviation of a discrete random variable X
σ=√σ₂
standard deviation
used to measure variability of a data set, calculated as the square root of the variance of a set of data
standard error
an estimate of the standard deviation of the sampling distribution of a statistic
standard normal probabilities
the probabilities calculated from values of the standard normal distribution
standardized score
the number of standard deviations an observation lies from the mean; z=observed-mean/standard deviation
statistically significant
when a sample statistic is shown to be far from a hypothesized parameter; when the P-value is less than the significance level
stemplot/stem-and-leaf plot
data are separated into a steam and a leaf by place value and organized in the form of a histogram
strata
subgroups of a population that are similar or homogeneous
stratification
part of the sampling process where units of the study are separated into strata
stratified random sample
a sample in which simple random samples are selected from each of several homogenous subgroups of the population, known as strata
subjects
individuals in an experiment that are people
symmetric
the distribution that resembles a mirror image on either side of the center
systematic random sample
a sample where every kth individual is selected from a list or queue
test statistic
the number of standard deviations (standard errors) that a sample statistic lies from a hypothesized population parameter
third quartile
symbolized Q₃, represents the median of the upper 50% of a data set
transformation
changing the values of a data set using a mathematical operation
treatments
combinations of different levels of the factors in an experiment
two-way table
a frequency table that displays two categorical variables
type I error
rejecting a null hypothesis when it is in fact true
type II error
failing to reject the null hypothesis when it is in fact false
undercoverage
when some individuals or a population are not included in the sampling process
uniform
all data values in the distribution have similar frequencies
unimodal
a distribution with a single, clearly defined peak
univariate
one-variable data
variables
characteristics of the individuals under study
variability
the spread in a data set
variance
used to measure variability, the average of the squared deviations from the mean
variance of a binomial random variable
σ²=np(1-p)
variance of a discrete random variable
σ²=∑(xi-µx)²×P(xi)
venn diagram
graphical representation of sets or outcomes and how they intersect
voluntary response bias
bias due to the manner in which people choose to respond to voluntary surveys
voluntary response sample
composed of individuals who choose to respond to a survey because of interest in the subject
;