33 terms

outlier

an individual value that falls outside the overall pattern

-points that are clearly apart form the body of the data, not just the most extreme observations in a distribution

-points that are clearly apart form the body of the data, not just the most extreme observations in a distribution

midpoint

center of a distribution

unimodal

a distribution with one major peak

skewed to the right

right tail (larger values) is much longer than the left tail (smaller values)

-vice versa for the left

-vice versa for the left

time plot

plots each observation against the time at which it was measured.

categorical variable

places each individual into a category

-i.e. male or female

-displayed by bar graphs and pie charts

-i.e. male or female

-displayed by bar graphs and pie charts

quantitative variable

numerical values that measure some characteristic of each case

-i.e. height in cm, or annual salary in $

-displayed by stemplots and histograms

-i.e. height in cm, or annual salary in $

-displayed by stemplots and histograms

explanatory data analysis

uses graphs and numerical summaries to describe the variables in a data set and the relations among them

distribution

tells what values a variable takes and how often it takes these values

histogram

plots the frequencies (counts) or the percents of equal width classes of values

upper quartile

the median of the upper half of the data

(vice versa with lower quartile)

(vice versa with lower quartile)

pth percentile

in a distribution, the value that has p percent of the observation fall at or below it

five-number summary

set of observations that consists of the smallest observation, the first quartile, the media, the third quartile, and the largest observation, written in order from smallest to largest

-Minimum, Q1, M, Q3, Maximum

-Minimum, Q1, M, Q3, Maximum

boxplot

a graph of the five-number summary

-a central box spans the quartiles Q1 and Q3

-a line in the box marks the median M

-lines extend from the box out to the smallest and largest observations

-a central box spans the quartiles Q1 and Q3

-a line in the box marks the median M

-lines extend from the box out to the smallest and largest observations

interquartile range (IQR)

the distance between the first and third quartiles

the 1.5 x IQR rule for outliers

an observation is a suspected outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile

modified boxplot

a plot where suspected outliers are identified individually

standard deviation

measures the spread by looking at how far the observations are from their mean

-square root of the variance

-best for reasonably symmetric distributions that are free of outliers

-square root of the variance

-best for reasonably symmetric distributions that are free of outliers

variance (s squared)

the average squared deviation

degrees of freedom

n-1, degrees of freedom of the variance or standard deviation

linear transformation

a change in the measurement unit

-do not change the shape of a distribution

-the center and spread do change, however

-do not change the shape of a distribution

-the center and spread do change, however

resistant measure

any aspect of a distribution that is relatively unaffected by changes in the numerical value of a small proportion of the total number of observations, now matter how large these changes are

density curves

a graph that describes the overall pattern of the data

-ignores minor irregularities and outliers

-is always on or above the x-axis

-has area exactly 1 underneath it (the proportion of all observations that fall in that range

-ignores minor irregularities and outliers

-is always on or above the x-axis

-has area exactly 1 underneath it (the proportion of all observations that fall in that range

mode of a density curve

peak point of a curve, where the curve is the highest

median of a density curve

the point that divides the area under the curve in half

mean of a density curve

the balancing point, at which the curve would balance if made of solid material

Normal curves (normal distributions)

symmetric, unimodal, and bells shaped

-mean is at the center, same as the median

-shape determined by mean and standard deviation

-mean is at the center, same as the median

-shape determined by mean and standard deviation

68-95-99.7 Rule

in the normal distribution

-~68% of observations fall within 1 standard deviation of the mean

-~95% of obs. fall within 2 stand. deviations of the mean

-~99.7% of obs. fall within 3 stand. dev. of the mean

-~68% of observations fall within 1 standard deviation of the mean

-~95% of obs. fall within 2 stand. deviations of the mean

-~99.7% of obs. fall within 3 stand. dev. of the mean

standardized value

"z-score"

-subtract the mean of the distribution from x and divide by the standard deviation

-tells how many standard deviations the original observations falls away from the mean, and in which direction

-obs. > mean = +

-obs. < mean = -

-Is a LINEAR TRANSFORMATION

-subtract the mean of the distribution from x and divide by the standard deviation

-tells how many standard deviations the original observations falls away from the mean, and in which direction

-obs. > mean = +

-obs. < mean = -

-Is a LINEAR TRANSFORMATION

standard Normal distribution

normal distribution N(0,1) with mean 0 and standard deviation 1

cumulative proportions

proportion of observations in a distribution that lie at or below a given value

-ex. density curve- cum. prop. is the area under the curve to the left of a given value

-ex. density curve- cum. prop. is the area under the curve to the left of a given value

normal quantile plot

assesses the adequacy of a normal model for describing a distribution of data

-a pattern on such a plot that deviates substantially from a straight line indicates that the data are not normal

-a pattern on such a plot that deviates substantially from a straight line indicates that the data are not normal

density estimator

looks at data and draws a density curve that describes the overall shape of the data

-join stemplots and histograms as useful graphical tools for exploratory data analysis

-join stemplots and histograms as useful graphical tools for exploratory data analysis