33 terms

Stats Chapter 1

STUDY
PLAY

Terms in this set (...)

outlier
an individual value that falls outside the overall pattern
-points that are clearly apart form the body of the data, not just the most extreme observations in a distribution
midpoint
center of a distribution
unimodal
a distribution with one major peak
skewed to the right
right tail (larger values) is much longer than the left tail (smaller values)
-vice versa for the left
time plot
plots each observation against the time at which it was measured.
categorical variable
places each individual into a category
-i.e. male or female
-displayed by bar graphs and pie charts
quantitative variable
numerical values that measure some characteristic of each case
-i.e. height in cm, or annual salary in $
-displayed by stemplots and histograms
explanatory data analysis
uses graphs and numerical summaries to describe the variables in a data set and the relations among them
distribution
tells what values a variable takes and how often it takes these values
histogram
plots the frequencies (counts) or the percents of equal width classes of values
upper quartile
the median of the upper half of the data
(vice versa with lower quartile)
pth percentile
in a distribution, the value that has p percent of the observation fall at or below it
five-number summary
set of observations that consists of the smallest observation, the first quartile, the media, the third quartile, and the largest observation, written in order from smallest to largest
-Minimum, Q1, M, Q3, Maximum
boxplot
a graph of the five-number summary
-a central box spans the quartiles Q1 and Q3
-a line in the box marks the median M
-lines extend from the box out to the smallest and largest observations
interquartile range (IQR)
the distance between the first and third quartiles
the 1.5 x IQR rule for outliers
an observation is a suspected outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile
modified boxplot
a plot where suspected outliers are identified individually
standard deviation
measures the spread by looking at how far the observations are from their mean
-square root of the variance
-best for reasonably symmetric distributions that are free of outliers
variance (s squared)
the average squared deviation
degrees of freedom
n-1, degrees of freedom of the variance or standard deviation
linear transformation
a change in the measurement unit
-do not change the shape of a distribution
-the center and spread do change, however
resistant measure
any aspect of a distribution that is relatively unaffected by changes in the numerical value of a small proportion of the total number of observations, now matter how large these changes are
density curves
a graph that describes the overall pattern of the data
-ignores minor irregularities and outliers
-is always on or above the x-axis
-has area exactly 1 underneath it (the proportion of all observations that fall in that range
mode of a density curve
peak point of a curve, where the curve is the highest
median of a density curve
the point that divides the area under the curve in half
mean of a density curve
the balancing point, at which the curve would balance if made of solid material
Normal curves (normal distributions)
symmetric, unimodal, and bells shaped
-mean is at the center, same as the median
-shape determined by mean and standard deviation
68-95-99.7 Rule
in the normal distribution
-~68% of observations fall within 1 standard deviation of the mean
-~95% of obs. fall within 2 stand. deviations of the mean
-~99.7% of obs. fall within 3 stand. dev. of the mean
standardized value
"z-score"
-subtract the mean of the distribution from x and divide by the standard deviation
-tells how many standard deviations the original observations falls away from the mean, and in which direction
-obs. > mean = +
-obs. < mean = -
-Is a LINEAR TRANSFORMATION
standard Normal distribution
normal distribution N(0,1) with mean 0 and standard deviation 1
cumulative proportions
proportion of observations in a distribution that lie at or below a given value
-ex. density curve- cum. prop. is the area under the curve to the left of a given value
normal quantile plot
assesses the adequacy of a normal model for describing a distribution of data
-a pattern on such a plot that deviates substantially from a straight line indicates that the data are not normal
density estimator
looks at data and draws a density curve that describes the overall shape of the data
-join stemplots and histograms as useful graphical tools for exploratory data analysis