stem & leaf plot

a graph of a distribution of quantitative data in which all but the final digits of data values (written in numerical order) form a column called a stem and the final digit of each data value is written in increasing order outward from the column to form leaves

back-to-back stem & leaf plot

a stem-and-leaf plot or stemplot that is used to compare distributions of quantitative variables for two data sets

dotplot

a graph of a distribution of quantitative data in which each data value is shown as a dot above its location on a number line

histogram

a graph of a distribution of quantitative data in which nearby values are grouped together in what are often called "bins" or "classes"

sample mean

the average of a subset of the population, denoted by x-bar: x̄

population mean

the average of a population, denoted by the Greek letter mu: μ

resistant

a statistic that is not influenced by extreme observations

unimodal

describes the shape of a distribution whose graph has one major peak

bimodal

describes the shape of a distribution whose graph has two major peaks

symmetric

describes the shape of a distribution whose graph has roughly mirror images on the left and right sides

skewed

describes the shape of a distribution whose graph has one side that is much longer than the other ; distribution is named based on the direction of the tail

quartiles

One quarter of the data values are smaller than the first quartile (Q1) and three quarters of the data values are smaller than the third quartile (Q3)

interquartile range (IQR)

The range for the middle 50% of the data values-- the range between the quartiles (Q3-Q1)

1.5 IQR Rule

used for identifying outliers: any values that are more than 1.5 times the IQR lower than the first quartile or higher than the third quartile are called outliers

standard deviation

measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root.;

*usually denoted s for a sample or lower case Greek sigma, σ, for a population

variance

the square of the standard deviation (σ^2)

distribution

tells all the possible values of a variable and how often they each occur

two-way table

array that displays counts of two categorical variables with columns indicating the distribution for one variable and rows indicating the distribution for the other variable

marginal distribution

in a two-way table, this counts the distribution of values of one of the categorical variables among all individuals described by the table

conditional distribution

in a two-way table, this describes the values of a variable among individuals who have a specific value of another variable (there's a separate conditional distribution for each value of the other variable)

association

two variables have this quality if knowing the value of one variable helps predict the value of the other

center

a typical value-- could be the median or the mean or the mode

spread

how much a data set varies-- could be the range, the IQR, or the standard deviation

shape

a description of the symmetry or asymmetry and the number of modes/peaks

outlier

any data value that is unusually low or unusually high

bar graph

a graph of a distribution of categorical data in which bars extend to display the frequency of various categories that are placed on an axis

median

the midpoint of a distribution, such that about half the observations are smaller and half the observations are larger

range

the distance (a single number) between the minimum and maximum values in a data set (max - min)

boxplot (or box and whisker plot)

a graph of a distribution of quantitative data in which a central box extends between the quartiles with a central line marking the the median, lines (called whiskers) extend to the largest and smallest values that are not outliers, while outliers are marked with individual points/dots

categorical variable

individual into one of several groups that describes something

quantitative variable

numerical values that can be averaged in a way that makes sense in context

What are three ways you can display categorical data?

pie charts, bar graphs, 2-way table

segmented bar graph

a bar graph that is used to compare distributions of a categorical variable for two or more data sets; *segments of bars that add up to 100

What does SOCS stand for?

Shape, Outliers, Center, Spread

unimodal has ___ center(s), bimodal has ___ center(s)

1;2

A symmetric distribution has the same ____, ______, and ____

mean, median, mode

When should you use mean as the center?

when you have a symmetric distribution

When should you use the median as the center?

when you have a skewed/non-symmetric distribution

What are three ways you can use to measure spread?

range, standard deviation, and interquartile range

What's the difference between histograms and bar graphs?

histograms are used with quantitative data, bar graphs are used with categorical data

five-number summary

The minumum value, lower quartile, median, upper quartile, and maximum value for a data set. These five values give a summary of the shape of the distribution and are used to make box plots.

The five numbers that help describe the center, spread and shape of data

population

the entire group of individuals that we want information about

sample

representative of an entire population

variable

A variable is any characteristic whose value may change from one individual to another

univariate data set

observations on a single variable made on individuals in a sample or population.

bivariate data set

observations on two variables made on individuals in a sample or population

multivariate data set

observations on two or more variables made on individuals in a sample or population

frequency

the number of times the category appears in the data set.

relative frequency

the fraction or proportion of the time that the category appears in the data set. It equals frequency/number of observations in the data set.