44 terms

# Chapter 2

###### PLAY
raw data
numbers and categories that have been collected but have not yet been processed in any way
variable
a characteristic that can differ from one individual to the next
observational unit
a single individual who participates in a study
sample data
measurements that are taken from a subset of a population
sample size
the total number of observational units
dataset
the complete set of raw data, for all observational units and variables in a survey or experiment
population data
measurements that are taken from all individuals of a population
statistic
a summary measure computed from sample data
parameter
a summary measure for an entire population
descriptive statistics
the summary numbers for either a population or sample
categorical variable
consists of a group or category names that don't necessarily have any logical ordering; each individual only falls into one category
ordinal variable
a categorical variable that may be used to describe the data when a categorical variable has ordered categories
quantitative variable
raw data that are recorded as numerical values (either measurements or counts)
continuous variable
a type of quantitative variable that is used when every value within some interval is a possible result
explanatory variable
the value of which for an individual is thought to partially explain the value of the response variable for that same individual
response variable
a variable that is an effect of another variable
frequency
count of how many observations fall into a category
relative frequency
the proportion or percentage in a category relative to the total count over all categories
frequency distribution
the listing of all categories along with their frequencies
relative frequency distribution
a listing of all categories along with relative frequencies (given a proportions or percentages)
pie charts
visual representations that are useful for summarizing a single categorical variable if there aren't too many categories
bar graphs
visual representations that are useful for summarizing one or two categorical variables; especially useful for comparing two categorical variables
distribution
the overall pattern of how often the possible values occur
location
on a distribution, this is represented by the center or the average
median
approximate middle value of data
mean
the arithmetic average of data
variability
on a distribution, the spread among individual measurements
shape
on a distribution, can be clumped or skewed; describes the graph
outliers
data points that are not consistent with the bulk of the data
histogram
similar to a bar graph, though not extremely informative when the sample size is small
stem-and-leaf plots
present all individual values; can be overwhelming for large datasets
boxplot
displays information in a five-number summary; useful for comparing multiple groups and identifying outliers
right
a graph is skewed to the _______ if higher values are more spread out than lower values
left
a graph is skewed to the _______ if lower values are more spread out than higher values
mode
the most frequent value
unimodal
if there is a single prominent peak in a histogram, stemplot, or dotplot
range
the highest value minus the lowest value
interquartile range
upper quartile - lower quartile
resistant statistic
a numerical summary of the data that is "resistant" to the influence of outliers, meaning outliers won't have a major influence on a statistic's numerical value
first summary number
the mean of a bell-shaped distribution
second summary number
the standard deviation of a bell-shaped distribution
standard deviation
the measure of the spread of values, represented by s; the average distance that values fall from the mean
variance
the squared value of the standard deviation
empirical rule
68% of values fall within 1 standard deviation of the mean in either direction; 95% of values fall within 2 standard deviations of the mean in either direction; 99.7% of values fall within 3 standard deviations of the mean in either direction