99 terms

Sampling, Bias and Descriptive Stats

the science of gathering, describing and analyzing data // or // the numerical descriptions of sample data
a group of interest about which one is trying to make an inference
the values that change among members of the population
information gathered about a specific variable or variables
data/survey gathered from all members of a population (usually only done if population is small)
a numerical description of a population characteristic
(sample) statistic
a numerical description of a sample characteristic
a subset of the population
descriptive statistics
gathers, sorts, summarizes and describes data collected
inferential statistics
uses descriptive statistics to estimate population parameters
quantitative data
data that is numerical in form (and the mean is meaningful)
qualitative data
also called categorical data is a variable that does not take numerical values naturally (or numerical data where the mean is not meaningful, such as jersey numbers)
continuous data
quantitative data that can take on any value in an interval (including decimals and fractions)
discrete data
can only take on integer values (such as number of cats in a family)
nominal level of measurement
name data that cannot be intrinsically ordered
ordinal level of measurement
qualitative data that has a natural order, but is non-numerical (the mean is not meaningful)
interval level of measurement
quantitative data that does not have a "true" zero (ratios of values with the same numerical result don't mean the same thing)
ratio level of measurement
quantitative data with a "true" zero (ratios that produce the same result mean the same thing)
observational study
a study that observes natural behavior and records it for analysis without interferences
experimental study
contrives a situation to test a particular variable for causation
representative sample
a sample with the same relevant characteristics as the population
simple random sample
every member of the population has an equal chance of being chosen
stratified sample
a population is divided into groups (strata) and then a random sample is taken from each group
cluster sample
a population is divided into clusters, and then the clusters are randomly selected and all members of the chosen clusters are surveyed
systematic sample
a sample is selected by choosing every nth member of the population (depending on how large a sample is needed)
convenience sample
the sample is selected so that it is "convenient" to the researcher and not necessarily representative
cross-sectional study
a study conducted as a snapshot in time
longitudinal study
a study conducted over an extended period of time
a study the compiles information from previous studies
case study
a study that looks at multiple variables that affect a single event
one category of a variable controlled by a study (can include placebo)
the people or things an experiment is conducted on
response variable
the variable measured at the end of an experiment (variable that responds to the treatment)
explanatory variable
the variable that is thought to cause a change in the response variable (treatment variable)
treatment group
the group in a study receiving the active (non-placebo) treatment
control group
the group in a study receiving the placebo
confounding variable
factors other than the explanatory variable that can also affect the response variable
placebo effect
positive response to the suggestion that a subject is being treated even when they are not receiving the active treatment
inert substance used in place of an active treatment in blind or double-blind studies
like a placebo, but with a negative effect on the response variable rather than a positive one
single-blind experiment
researchers interacting with subject know which subject is receiving active treatment or placebo but subjects are not told
double-blind experiment
researchers interacting with subject do not know which subject is receiving the active treatment or placebo, and subjects don't know either
institutional review board
a group that determines if the conditions of an experimental design are ethically sound and won't harm subjects
informed consent
subjects much know the scope and procedures of a study, including any possible risks, before agreeing to participate
favors a particular outcome
sampling bias
bias in a study created from an non-representative sample
a kind of bias created when participants drop out before the study is complete, or fail to follow all the required procedures
processing errors
errors in data not caused by sampling or other problems, but end up in data due to human or machine error
researcher bias
intentional or unintentional bias created by a researcher not being fully objective or desiring or expecting a particular outcome
response bias
a bias created by respondents to surveys who make errors in their responses or deliberately lie (for instance, in disclosing sensitive information)
participation bias
occurs in voluntary response samples when participation in a study is self-selected and not random
non-response bias
due to lack of response to a randomized survey; very common in modern telephone surveys where selected participants may not answer the phone
a way of describing a particular dataset or population
counts of data values
category, or interval of values for a particular variable
class width
in quantitative variables, the difference between the lower limit and the upper limit of a continuous variable (or the lower limit of the class and the lower limit of the next class in the discrete case); the range of a class
relative frequency
proportion of a sample in a particular class
cumulative frequency
number (or proportion) of a sample less than or equal to a particular class
pie chart
a graph showing relative frequencies as proportions of a circle (usually of categorical data)
bar graph
a graph of categorical data the uses bars to represent the frequency in each class/category
Pareto chart
a bar graph sorted by frequency (typically highest to lowest)
a graph of quantitative data that divides the data into classes of equal width and displays the frequencies in each class with bars
frequency polygon
a line graph that plots frequencies of individual values
ogive graph
a cumulative frequency line graph
stem-and-leaf plot (or stemplot)
similar to a histogram, but displays original data
dot plot
like a histogram, but with dots for individual observations, best used with discrete data with limits outcomes
line graph
two dimension data (frequently time vs. another variable) where measurements (dots) are connected with straight lines
uniform distribution
a graph where most of the bars (frequencies of classes or values) is approximately the same for all outcomes
symmetric distribution
a graph that has right-left symmetry (typically bell-shaped)
skewed right distribution
a graph with a tail stretching into higher values (on the right)
skewed left distribution
a graph with a tail stretching into lower values (on the left)
mean (arithmetic mean)
sum of the values divided by the number of values (average)
the middle value in a distribution (marks 50% below, 50% above)
most common value (for small data sets, report 'no mode' if more than two modes; large datasets can be multi-modal)
the difference between the maximum value and the minimum value
the standard deviation
a measure of how much we might expect a typical value in a dataset to differ from the mean
the square of the standard deviation
coefficient of variation
the standard deviation divided by the mean
empirical rule // 68-95-99.7 rule
68% of data within one standard deviation of mean; 95% of data within two standard deviations of the mean; 99.7% of data within three standard deviations of the mean
Pth percentile means that P% of the data is at or below that given value
n/10 of the data is at or below the nth decile
n/4 of the data is at or below the nth quartile
5-number summary
minimum, first quartile, median, third quartile, maximum
interquartile range (IQR)
the difference between Q3 and Q1
lower fence
the boundary that marks outliers (below this value) calculated from Q1-1.5IQR
upper fence
the boundary that marks outliers (above this value) calculated from Q3+1.5IQR
standard score
the difference between the observation and the mean, divided by the standard deviation
box-and-whisker plot // boxplot
a graph created from the 5-number summary (when there are no outliers)
a two-dimension plot of points not connected with lines
correlation coefficient
also called the Pearson correlation coefficient, measures the strength of the linear relationship between two variables; values -1<=r<=1
coefficient of determination
measures the proportion of variation in y (response variable) that can be attributed to x (explanatory variable); r^2
least-squares regression line
the line of best fit through a scatterplot
using a regression equation to predict values outside the range of the original data
fair game
a game where the expected value is 0 for all players
block design
a study design where participants are broken up into groups (typically demographic groups like race, gender, or some other behavior), and then where each group is divided into treatment and placebo groups for the experiment. This is done to help control for possible confounding variables.
quota sampling
a systematic effort to force the sample to be representative of relevant subjects of a population through the use of quotas (similar to probability sampling)
sampling frame
is the (usually a) list of the population from which a sample is drawn
selection bias
a sample has a built-in tendency to over-include or under-include a particular group in the population
something that allows you to estimate a value or count indirectly

Flickr Creative Commons Images

Some images used in this set are licensed under the Creative Commons through Flickr.com.
Click to see the original works with their full license.