###
raw data

numbers and categories that have been collected but have not yet been processed in any way

###
variable

a characteristic that can differ from one individual to the next

###
observational unit

a single individual who participates in a study

###
sample data

measurements that are taken from a subset of a population

###
sample size

the total number of observational units

###
dataset

the complete set of raw data, for all observational units and variables in a survey or experiment

###
population data

measurements that are taken from all individuals of a population

###
statistic

a summary measure computed from sample data

###
parameter

a summary measure for an entire population

###
descriptive statistics

the summary numbers for either a population or sample

###
categorical variable

consists of a group or category names that don't necessarily have any logical ordering; each individual only falls into one category

###
ordinal variable

a categorical variable that may be used to describe the data when a categorical variable has ordered categories

###
quantitative variable

raw data that are recorded as numerical values (either measurements or counts)

###
continuous variable

a type of quantitative variable that is used when every value within some interval is a possible result

###
explanatory variable

the value of which for an individual is thought to partially explain the value of the response variable for that same individual

###
response variable

a variable that is an effect of another variable

###
frequency

count of how many observations fall into a category

###
relative frequency

the proportion or percentage in a category relative to the total count over all categories

###
frequency distribution

the listing of all categories along with their frequencies

###
relative frequency distribution

a listing of all categories along with relative frequencies (given a proportions or percentages)

###
pie charts

visual representations that are useful for summarizing a single categorical variable if there aren't too many categories

###
bar graphs

visual representations that are useful for summarizing one or two categorical variables; especially useful for comparing two categorical variables

###
distribution

the overall pattern of how often the possible values occur

###
location

on a distribution, this is represented by the center or the average

###
median

approximate middle value of data

###
mean

the arithmetic average of data

###
variability

on a distribution, the spread among individual measurements

###
shape

on a distribution, can be clumped or skewed; describes the graph

###
outliers

data points that are not consistent with the bulk of the data

###
histogram

similar to a bar graph, though not extremely informative when the sample size is small

###
stem-and-leaf plots

present all individual values; can be overwhelming for large datasets

###
boxplot

displays information in a five-number summary; useful for comparing multiple groups and identifying outliers

###
right

a graph is skewed to the _______ if higher values are more spread out than lower values

###
left

a graph is skewed to the _______ if lower values are more spread out than higher values

###
mode

the most frequent value

###
unimodal

if there is a single prominent peak in a histogram, stemplot, or dotplot

###
range

the highest value minus the lowest value

###
interquartile range

upper quartile - lower quartile

###
resistant statistic

a numerical summary of the data that is "resistant" to the influence of outliers, meaning outliers won't have a major influence on a statistic's numerical value

###
first summary number

the mean of a bell-shaped distribution

###
second summary number

the standard deviation of a bell-shaped distribution

###
standard deviation

the measure of the spread of values, represented by s; the average distance that values fall from the mean

###
variance

the squared value of the standard deviation

###
empirical rule

68% of values fall within 1 standard deviation of the mean in either direction; 95% of values fall within 2 standard deviations of the mean in either direction; 99.7% of values fall within 3 standard deviations of the mean in either direction

