statistics

the science of gathering, describing and analyzing data // or // the numerical descriptions of sample data

population

a group of interest about which one is trying to make an inference

variables

the values that change among members of the population

data

information gathered about a specific variable or variables

census

data/survey gathered from all members of a population (usually only done if population is small)

parameter

a numerical description of a population characteristic

(sample) statistic

a numerical description of a sample characteristic

sample

a subset of the population

descriptive statistics

gathers, sorts, summarizes and describes data collected

inferential statistics

uses descriptive statistics to estimate population parameters

quantitative data

data that is numerical in form (and the mean is meaningful)

qualitative data

also called categorical data is a variable that does not take numerical values naturally (or numerical data where the mean is not meaningful, such as jersey numbers)

continuous data

quantitative data that can take on any value in an interval (including decimals and fractions)

discrete data

can only take on integer values (such as number of cats in a family)

nominal level of measurement

name data that cannot be intrinsically ordered

ordinal level of measurement

qualitative data that has a natural order, but is non-numerical (the mean is not meaningful)

interval level of measurement

quantitative data that does not have a "true" zero (ratios of values with the same numerical result don't mean the same thing)

ratio level of measurement

quantitative data with a "true" zero (ratios that produce the same result mean the same thing)

observational study

a study that observes natural behavior and records it for analysis without interferences

experimental study

contrives a situation to test a particular variable for causation

representative sample

a sample with the same relevant characteristics as the population

simple random sample

every member of the population has an equal chance of being chosen

stratified sample

a population is divided into groups (strata) and then a random sample is taken from each group

cluster sample

a population is divided into clusters, and then the clusters are randomly selected and all members of the chosen clusters are surveyed

systematic sample

a sample is selected by choosing every nth member of the population (depending on how large a sample is needed)

convenience sample

the sample is selected so that it is "convenient" to the researcher and not necessarily representative

cross-sectional study

a study conducted as a snapshot in time

longitudinal study

a study conducted over an extended period of time

meta-analysis

a study the compiles information from previous studies

case study

a study that looks at multiple variables that affect a single event

treatment

one category of a variable controlled by a study (can include placebo)

subjects

the people or things an experiment is conducted on

response variable

the variable measured at the end of an experiment (variable that responds to the treatment)

explanatory variable

the variable that is thought to cause a change in the response variable (treatment variable)

treatment group

the group in a study receiving the active (non-placebo) treatment

control group

the group in a study receiving the placebo

confounding variable

factors other than the explanatory variable that can also affect the response variable

placebo effect

positive response to the suggestion that a subject is being treated even when they are not receiving the active treatment

placebo

inert substance used in place of an active treatment in blind or double-blind studies

nocebo

like a placebo, but with a negative effect on the response variable rather than a positive one

single-blind experiment

researchers interacting with subject know which subject is receiving active treatment or placebo but subjects are not told

double-blind experiment

researchers interacting with subject do not know which subject is receiving the active treatment or placebo, and subjects don't know either

institutional review board

a group that determines if the conditions of an experimental design are ethically sound and won't harm subjects

informed consent

subjects much know the scope and procedures of a study, including any possible risks, before agreeing to participate

bias

favors a particular outcome

sampling bias

bias in a study created from an non-representative sample

non-adherence

a kind of bias created when participants drop out before the study is complete, or fail to follow all the required procedures

processing errors

errors in data not caused by sampling or other problems, but end up in data due to human or machine error

researcher bias

intentional or unintentional bias created by a researcher not being fully objective or desiring or expecting a particular outcome

response bias

a bias created by respondents to surveys who make errors in their responses or deliberately lie (for instance, in disclosing sensitive information)

participation bias

occurs in voluntary response samples when participation in a study is self-selected and not random

non-response bias

due to lack of response to a randomized survey; very common in modern telephone surveys where selected participants may not answer the phone

distribution

a way of describing a particular dataset or population

frequencies

counts of data values

class

category, or interval of values for a particular variable

class width

in quantitative variables, the difference between the lower limit and the upper limit of a continuous variable (or the lower limit of the class and the lower limit of the next class in the discrete case); the range of a class

relative frequency

proportion of a sample in a particular class

cumulative frequency

number (or proportion) of a sample less than or equal to a particular class

pie chart

a graph showing relative frequencies as proportions of a circle (usually of categorical data)

bar graph

a graph of categorical data the uses bars to represent the frequency in each class/category

Pareto chart

a bar graph sorted by frequency (typically highest to lowest)

histogram

a graph of quantitative data that divides the data into classes of equal width and displays the frequencies in each class with bars

frequency polygon

a line graph that plots frequencies of individual values

ogive graph

a cumulative frequency line graph

stem-and-leaf plot (or stemplot)

similar to a histogram, but displays original data

dot plot

like a histogram, but with dots for individual observations, best used with discrete data with limits outcomes

line graph

two dimension data (frequently time vs. another variable) where measurements (dots) are connected with straight lines

uniform distribution

a graph where most of the bars (frequencies of classes or values) is approximately the same for all outcomes

symmetric distribution

a graph that has right-left symmetry (typically bell-shaped)

skewed right distribution

a graph with a tail stretching into higher values (on the right)

skewed left distribution

a graph with a tail stretching into lower values (on the left)

mean (arithmetic mean)

sum of the values divided by the number of values (average)

median

the middle value in a distribution (marks 50% below, 50% above)

mode

most common value (for small data sets, report 'no mode' if more than two modes; large datasets can be multi-modal)

range

the difference between the maximum value and the minimum value

the standard deviation

a measure of how much we might expect a typical value in a dataset to differ from the mean

variance

the square of the standard deviation

coefficient of variation

the standard deviation divided by the mean

empirical rule // 68-95-99.7 rule

68% of data within one standard deviation of mean; 95% of data within two standard deviations of the mean; 99.7% of data within three standard deviations of the mean

percentile

Pth percentile means that P% of the data is at or below that given value

decile

n/10 of the data is at or below the nth decile

quartile

n/4 of the data is at or below the nth quartile

5-number summary

minimum, first quartile, median, third quartile, maximum

interquartile range (IQR)

the difference between Q3 and Q1

lower fence

the boundary that marks outliers (below this value) calculated from Q1-1.5IQR

upper fence

the boundary that marks outliers (above this value) calculated from Q3+1.5IQR

standard score

the difference between the observation and the mean, divided by the standard deviation

box-and-whisker plot // boxplot

a graph created from the 5-number summary (when there are no outliers)

scatterplot

a two-dimension plot of points not connected with lines

correlation coefficient

also called the Pearson correlation coefficient, measures the strength of the linear relationship between two variables; values -1<=r<=1

coefficient of determination

measures the proportion of variation in y (response variable) that can be attributed to x (explanatory variable); r^2

least-squares regression line

the line of best fit through a scatterplot

extrapolation

using a regression equation to predict values outside the range of the original data