Terms in this set (45)
Standard Deviation
the average distance of observations from their mean
Variance
The average squared distanced of the observation from their mean
Bar Graph
Displays the counts or percents of categories in a CATEGORICAL variable through differing heights of bars
Distribution
Tells you what values a variable takes and how often it takes these values
Pie Chart
Displays a categorical variable using slices sized by the counts or percents for the categories
Association
When specific values of one variable tend to occur in common with specific values of another; can be influence by lurking variables
Mean
A measure of center, also called the average
Stemplot
A graphical display of quantitative data that involves splitting the individual values into two components; oganizes data from lowest to greatest
Dot Plot
One of the simplest graphs to construct when dealing with a small set of quantitative data
Inference
Drawing conclusions beyond the data at hand
Skewed
The shape of a distribution if one side of the graph is much longer than the other
Resistant
What we call a measure that is relatively unaffected by extreme observations
Median is resistant (not affected by outliers)
Mean is not resistant (affected by outlies)
Individuals
the objects described by a set of data; can be people, animals, or things
Median
the midpoint of a distribution of quantitative data
Conditional Distribution
describes the distribution of values of a categorical variable among individuals who have a specific value of another variable; use a side by side bar graph or segmented bar graph
Segmented Bar Graph
used to show conditional distributions
Side by Side Bar Graph
used to show conditional distributions
Categorical Variable
A variable that places an individual into one of several groups or categories
Variable
A characteristic of an individual that can take different values for different individuals; can be categorical or quantitative
Two Way Table
A method of organizing data when comparing two categorical variables
Box Plot
A graphical display of the five-number summary; minimum, Q1, median, Q3, maximum
Histogram
A graphical display of QUANTITATIVE data that shows the frequency of values in intervals by using bars
Quantitative Variable
A variable that makes numerical values for which it makes sense to find an average
Symmetrical
The shape of a distribution whose right and left sides are approximate mirror images of each other
Quartiles
These values lie one-quarter (Q1), one-half, and three-quarters (Q3) of the way up the list of quantitative data
Outlier
A value that is at least 1.5 IQRs above the third quartile or below the first quartile
SOCS
Spread (range, IQR, Standard deviation), Outliers, Center (median, mean, quartiles), Shape (mode, skewed right/left)
Skewed Right
bars decrease from left to right; mean is to the right of the median
Skewed Left
bars increase from left to right; mean is to the left of the median
Unimodal
single peak in the distribution
Bimodal
2 peaks in the distribution
Multimodal
>2 peaks in distribution
IQR
Q3 - Q1
If the mean and median are = then...
the data is symmetric
Outlier equation
any value > Q3 + 1.5 (IQR)
any value < Q1 - 1.5 (IQR)
Measures of Spread
Range (not reliable because affected by outliers), IQR (resistant), Standard deviation (not resistant b/c mean based)
Deviations from mean add to what?
0
How to tell if a box plot is skewed to the right
If the distance from the minimum to the median is much smaller than the distance from the median to the maximum, then the graph is right skewed
How to tell if a box plot is skewed to the left
If the distance from the median to the maximum is much smaller than the distance from the minimum to the median, then the graph is left skewed
relative frequency
frequency divided by n
cumulative frequency
add the counts in the frequency column for the current row and and all rows with small values of the variable
cumulative relative frequency
cumulative frequency/n
Cumulative Relative Frequency Graph
Can be used to figure out percentile
z score
area under the curve
z= x-mean/standard deviation
less variability =
smaller standard deviation
