Addition Rule

P(A ∪ B) = P(A) + P(A) - P(A ∩ B) aids in computing the chances of one of several events occurring at a given time.

Alpha (α)

The probability of a Type I error. See significance level.

Alternative Hypothesis

The hypothesis stating what the researcher is seeking evidence of. A statement of inequality. It can be written looking for the difference or change in one direction from the null hypothesis or both.

Association

Relationship between or among variables.

Back-Transform

The process by which values are substituted into a model of transformed data, and then reversing the transforming process to obtain the predicted value or model for nontransformed data.

Bar Chart

A graphical display used with categorical data, where frequencies for each category are shown in vertical bars.

Bell-Shaped

Often used to describe the normal distribution. See mound-shaped.

Beta (β)

The probability of a Type II error. See power.

Bias

The term for systematic deviation from the truth (parameter), caused by systematically favoring some outcomes over others.

Biased

A sampling method is biased if it tends to produce samples that do not represent the population.

Bimodal

A distribution with two clear peaks.

Binomial Distribution

The probability distribution of a binomial random variable.

Binomial Random Variable

A random variable x (a) that has a fixed number of trials of a random phenomenon n, (b) that has only two possible outcomes on each trial, (c) for which the probability of a success is constant for each trial, and (d) for which each trial is independent of other trials.

Bins

The intervals that define the "bars" of a histrogram.

Bivariate Data

Consists of two variables, an explanatory and a response variable, usually quantitative.

Blinding

Practice of denying knowledge to subjects about which treatment is imposed upon them.

Blocks

Subgroups of the experimental units that are separated by some characteristic before treatments are assigned because they may respond differently to the treatments.

Box-And-Whisker Plot/Boxplot

A graphical display of the five-number summary of a set of data, which also shows outliers.

Categorical Variable

A variable recorded as labels, names, or other non-numerical outcomes.

Census

A study that observes, or attempts to observe, every individual in a population.

Central Limit Theorem

As the size n of a simple random sample increases, the shape of the sampling distribution of x̄ tends toward being normally distributed.

Chance Device

A mechanism used to determine random outcomes.

Cluster Sample

A sample in which a simple random sample of heterogeneous subgroups of a population is selected.

Clusters

Heterogeneous subgroups of a population.

Coefficient of Determination (r²)

Percent of variation in the response variable explained by its linear relationship with the explanatory variable.

Complement

The compliment of an event is that event not occurring.

Complementary Randomized Design

One in which all experimental units are assigned treatments solely by chance.

Conditional Distribution

See conditional frequencies.

Conditional Frequencies

Relative frequencies for each cell in a two-way table relative to one variable.

Conditional Probability

The probability of an event occurring given that another has occurred. The probability of A given that B has occurred is denoted as P(A|B).

Confidence Intervals

Give an estimated range that is likely to contain an unknown population parameter.

Confidence Level

The level of certainty that a population parameter exists in the calculated confidence interval.

Confounding

The situation where the effects of two or more explanatory variables on the response variable cannot be separated.

Confounding Variable

A variable whose effect on the response variable cannot be untangled from the effects of the treatment.

Contingency Table

See two-way table.

Continuous Random Variables

Those typically found by measuring, such as heights or temperatures.

Control Group

A baseline group that may be given no treatment, a faux treatment like a placebo, or an accepted treatment that is to be compared to another.

Control

The principle that potential sources of variation due to variables not under consideration must be reduced.

Convenience Sample

Composed of individuals who are easily accessed or contacted.

Correlation Coefficient (r)

A measure of the strength of a linear relationship,

r=(1/(n-1))Σ((xi-x̄)/sx)((yi-ȳ)/sy).

r=(1/(n-1))Σ((xi-x̄)/sx)((yi-ȳ)/sy).

Critical Value

The value that the test statistic must exceed in order to reject the null hypothesis. When computing a confidence interval, the value of t** (or z**) where ±t** (or ± z**z*) bounds the central C% of the t (or z) distribution.

Cumulative Frequency

The sums of the frequencies of the data values from smallest to largest.

Data Set

Collection of observations from a sample or population.

Dependent Events

Two events are called dependent when they are related and the fact that one event has occurred changes the probability that the second event occurs.

Discrete Random Variables

Those usually obtained by counting.

Disjoint Events

Events that cannot occur simultaneously.

Distribution

Frequencies of values in a data set.

Dotplot

A graphical display used with univariate data. Each data point is shown as a dot located above its numerical value on the horizontal axis.

Double-Blind

When both the subjects and data gatherers are ignorant about which treatment a subject received.

Empirical Rule (68-95-99.7) Rule

Gives benchmarks for understanding how probability is distributed under a normal curve. In the normal distribution, 68% of the observations are within one standard deviation of the mean, 95% is within two standard deviations of the mean, and 99.7% is within three standard deviations of the mean.

Estimation

The process of determining the value of a population parameter from a sample statistic.

Expected Value

The mean of a probability distribution.

Experiment

A study where the researcher deliberately influences individuals by imposing conditions and determining the individuals' responses to those conditions.

Experimental Units

Individuals (a person, a plot of land, a machine, or any single material unit) in an experiment.

Explanatory Variable

Explains the response variable, sometimes known as the treatment variable.

Exponential Model

A model of the form y = abˣ.

Extrapolation

Using a model to predict values far outside the range of the explanatory variable, which is prone to creating unreasonable predictions.

Factors

One or more explanatory variables in an experiment.

First Quartile

Symbolized Q1, represents the median of the lower 50% of a data set.

Five-Number Summary

The minimum, first quartile (Q1), median, third quartile (Q3), and maximum values in a data set.

Frequency Table

A display organizing categorical or numerical data and how often each occurs.

Geometric Distribution

The probability distribution of a geometric random variable X. All possible outcomes of X before the first success is seen and their associated probabilities.

Geometric Random Variable

A random variable X (a) that has two possible outcomes of each trial, (b) for which the probability of a success is constant for each trial, and (c) for which each trial is independent of the other trials.

Graphical Display

A visual representation of a distribution.

Histogram

Used with univariate data, frequencies are shown on the vertical axis, and intervals or bins define the values on the horizontal axis.

Independent Events

Two events are called independent when knowing that one event has occurred does not change the probability that the second event occurs.

Independent Random Variables

If the values of one random variable have no association with the values of another, the two variables are called independent random variables.

Influential Point

An extreme value whose removal would drastically change the slope of the least-squares regression model.

Interquartile Range

Describes the spread of middle 50% of a data set, IQR = Q3 - Q1.

Joint Distribution

See joint frequencies.

Joint Frequencies

Frequencies for each cell in a two-way table relative to the total number of data.

Law of Large Numbers

The long-term relative frequency of an event gets closer to the true relative frequency as the number of traits of random phenomenon increases.

Least-Squares Regression Line (LSRL)

The "best-fit" line that is calculated by minimizing the sum of the squares of the differences between the observed and predicted values of the line. The LSRL has the equation ŷ = bo + b1x.

levels

The different quantities or categories of a factor in an experiment.

Linear Regression

A method of finding the best model for a linear relationship between the explanatory and response variable.

Logarithmic Transformation

Procedure that changes a variable by taking the logarithm of each of its values.

Lurking Variable

A variable that has an effect on the outcome of a study but was not part of the investigation.

margin of Error

A range of values to the left and right of a point estimate.

Marginal Distribution

See marginal frequencies.

marginal Frequencies

Row totals and column totals in a two-way table.

Matched-Pairs Design

The design of a study where experimental units are naturally paired by a common characteristic, or with themselves in a before-after type of study.

Maximum

The largest numerical value in a data set.

Mean

The arithmetic average of a data set; the sum of all the values divided by the number of values, x̄ = (Σxi)/n.

Mean of a Binomial Random Variable X

μx = np.

Mean of a Discrete Random Variable

μx = Σ from i=1 to n of xiP(xi).

Mean of a Geometric Random Variable

μx=1/p.

measures of Center

These locate the middle of a distribution. The mean and median are measures of center.

Median

The middle value of a data set; the equal areas point, where 50% of the data are at or below this value, and 50% of the data are at or above this value.

Minimum

The smallest numerical value in a data set.

Mound-Shaped

Resembles a hill or mount; a distribution that is symmetric and unimodal.

Multiplication Rule

P(A ∩ B) = P(A) * P(B|A) is used when we are interested in teh probability of two events occurring simultaneously, or in succession.

Multistage Sample

A sample resulting from multiple applications of cluster, stratified, and/or simple random sampling.

Mutually Exclusive Events

See disjoint events.

Nonresponse Bias

The situation where an individual selected to be in the sample is unwilling, or unable, to provide data.

Normal Distribution

A continuous probability distribution that appears in many situations, both natural and man-made. It has a bell-shape and the area under the normal density curve is always equal to 1.

Null Hypothesis

The hypothesis of no difference, no change, and no association. A statement of equality, usually written in the form Ho: parameter = hypothesized value.

Observational Study

Attempts to determine relationships between variables, but the researcher imposes no conditions as in an experiment.

Observed Values

Actual outcomes or data from a study or an experiment.

One-Way Table

A frequency table of one variable.

Outlier

An extreme value in a data set. Quantified by being less than Q1 - 1.5**IQR or more than Q3 + 1.5**IRQ.

Percentiles

Divide the data set into 100 equal parts. An observation at the Pth percentile is higher tha P percent of all observations.

Placebo

A faux treatment given in an experiment that resembles the real treatment under consideration.

Placebo Effect

A phenomenon where subjects show a response to a treatment merely because the treatment is imposed regardless of its actual effect.

Point Estimate

An approximate value that has been calculated for the unknown parameter.

Population

The collection of all individuals under consideration in a study.

Population Parameter

A characteristic or measure of a population.

Position

Location of a data value relative to the population

Power

The probability of correctly rejecting the null hypothesis when it is in fact false. Equal to 1 - β. See beta and Type II error.

Power Model

A function in the form of y - axᵇ.

Predicted Value

The value of the response variable predicted by a model for a given explanatory variable.

Probability

Describes the chance that a certain outcome of a random phenomenon will occur.

Probability Distribution

A discrete random variable X is a function of all n possible outcomes of the random variable (xi) and their associated probabilities P(xi).

Probability Sample

Composed of individuals selected by chance.

P-Value

The probability of observing a test statistic as extreme as, or more extreme than, the statistic obtained from a sample, under the assumption that the null hypothesis is true.

Quantitative

A variable whose values are counts or measurements.

Random Digit Table

A chance device that is used to select experimental units or conduct simulations.

Random Phenomena

Those outcomes that are unpredictable in the short term, but nevertheless, have a long-term pattern.

Random Sample

A sample composed of individuals selected by chance.

Random Variables

Numerical outcome of a random phenomenon.

Randomization

The process by which treatments are assigned by a chance mechanism to the experimental units.

Randomized Block Design

First, units are sorted into subgroups or blocks, and then treatments are randomly assigned within the blocks.

Range

Calculated as the maximum value minus the minimum value in a data set.

Relative Frequency

Percentage or proportion of the whole number of data.

Replication

The practice of reducing chance variation by assigning each treatment to many experimental units.

Residual

Observed value minus predicted value of the response variable.

Response Bias

Because of the manner in which an interview is conducted, because of the phrasing of questions, or because of the attitude of the respondent, inaccurate data are collected.

Response Variable

Measures the outcomes that have been observed.

Sample

A selected subset of a population from which data are gathered.

Sample Statistic

Result of a sample used to estimate a parameter.

Sample Survey

A study that collects information from a sample of a population in order to determine one or more characteristics of the population.

Sampling Distribution

The probability distribution of a sample statistic when a sample is drawn from a population.

Sampling Distribution of the Sample Mean (x̄)

The distribution of sample means from all possible simple random samples of size n taken from a population.

Sampling Distribution of a Sample Proportion p̂

The distribution of sample proportions from all possible simple random samples of size n taken from a population.

Sampling Error

See sampling variability.

Sampling Variability

Natural variability due to the sampling process. Each possible random sample from a population will generate a different sample statistic.

Scatterplots

Used to visualize bivariate data. The explanatory variable is shown on the horizontal axis and the response variable is shown on the vertical axis.

Significance Level

The probability of a Type I error. A benchmark against which the P-value compared to determine if the null hypothesis will be rejected. See also alpha.

Simple Random Sample (SRS)

A sample where n individuals are selected from a population in a way that every possible combination of n individuals is equally likely.

Simulation

A method of modeling chance behavior that accurately mimics the situation being considered.

Skewed

A unimodal asymmetric, distribution that tends to slant-most of the data are clustered on one side of the distribution and "tails" off on the other side.

Standard Deviation of a Binomial Random Variable X

σₓ=√(np(1-p)).

Standard Deviation of a Discrete Random Variable X

σₓ=√(σ²ₓ).

Standard Deviation

Used to measure variability of a data set. It is calculated as the square root of the variance of a set of data,

s = √((Σ(xi-x̄)²/(n-1)).

s = √((Σ(xi-x̄)²/(n-1)).

Standard Error

An estimate of the standard deviation of the sampling distribution of a statistic.

Standard Normal Probabilities

The probabilities calculated from values of the standard normal distribution.

Standardized Score

The number of standard deviations an observation lies from the mean,

z = (observation - mean) / (standard deviation).

z = (observation - mean) / (standard deviation).

Statistically Significant

When a sample statistic is shown to be far from a hypothesized parameter. When the P-value is less than the significance level.

Stemplot

Also called a stem-and-leaf plot. Data are separated into a stem and leaf by place value and organized in the form of a histogram.

Strata

Subgroups of a population that are similar or homogeneous.

Stratification

Part of the sampling process where units of the study are separated into strata.

Stratified Random Sample

A sample in which simple random samples are selected from each of several homogeneous subgroups of the population, known as strata.

Subjects

individuals in an experiment that are people.

Symmetric

The distribution that resembles a mirror image on either side of the center.

Systematic Random Sample

A sample where every kth individual is selected from a list or queue.

Test Statistic

The number of standard deviations (standard errors) that a sample statistic lies from a hypothesized population parameter.

Third Quartile

Symbolized Q3, represents the median of the upper 50% of a data set.

Transformation

Changing the values of a data set using a mathematical operation.

Treatments

Combinations of different levels of the factors in an experiment.

Two-Way Table

A frequency table that displays two categorical variables.

Type I Error

Rejecting a null hypothesis when it is in fact true.

Type II Error

Failing to reject a null hypothesis when it is in fact false.

Undercoverage

When some individuals of a population are not included in the sampling process.

Uniform

All data values in the distribution have similar frequencies.

Unimodal

A distribution with a single, clearly defined, peak.

Univariate

One-variable data.

Variables

Characteristics of the individuals under study.

Variability

The spread in a data set.

Variance

Used to measure variability, the average of the squared deviations from the mean,

s²ₓ = √((Σ(xi-x̄)²/(n-1)).

s²ₓ = √((Σ(xi-x̄)²/(n-1)).

Variance of a Binomial Random Variable X

σ²ₓ - np(1-p).

Variance of a Discrete Random Variable X

σ²ₓ = Σ from i=1 to n of (xi-μₓ)²οP(xi).

Venn Diagram

Graphical representation of sets or outcomes and how they intersect.

Voluntary Response Bias

Bias due to the manner in which people choose to respond to voluntary surveys.

Voluntary Response Sample

Composed of individuals who choose to respond to a survey because of interest in the subject.

Z-Score

See standardized score.