the science of gaining information from numerical data
group of interest
subgroup of the population meant to represent the population
objects being described by data set
characteristic of an individual
values are labels and categories
values are numerical
distribution of a variable
tells what values the variable assumes and how often it takes these values
simple plot that allows one to visualize a relatively small data set. not convenient is data set is large.
common distribution graph for one variable data. areas of bars represent percent. histogram shows distribution of values of a quantitative variable. choice of bin width changes how histogram looks
compares sizes of different items, used to display frequencies related to categorical variables
observation that falls outside the overall pattern of a data set
described by center, spread, shape, outliers
described by mean, median, mode
smallest number to largest number. described by quartiles, range, interquartile range
symmetric, skewed to the right, skewed to the left
anything more/less than 1.5XIQR
always follows tail in a skewed distribution. influenced by outlier.
effective for small data set, include a key, split stems split one category into multiple, back-to-back = two stemplots in one to compare
5 number summary
min, q1, median, q3, max
middle value of data set. not influenced by outliers
largest value - smallest value. influenced by outliers.
Q3-Q1. not influenced by outliers.
number computed from a sample
number computed from a population
measure of spread, variance = stdev^2, 2 different st devs depending on population or sample. if s = 0, no spread so all observations are the same. use for symmetric data, not for skewed