Terms in this set (79)
What is Statistics?
a field that is dedicated to drawing conclusions about populations or processes based on samples drawn from them
population
- well-defined collection of objects of interest
- produces random outcomes
analytic study.
- study that is not enumerative in nature
- "process"
sample
any subset of the population
Good samples
simple random sample, a stratified random sample, or a cluster sample
Bad samples
convenience sample or a self-selected sample
In between Sample
systematic sample
Census
observe and collect every object in the population
Independent observations
best-case senario
Dependent observations
messy, describe/model the nature of the dependence
random variable
- any characteristic whose value can change from one observation to another
- what we actually measure
Notation
Variables denoted using letters at the end of the alphabet
Uppercase letters
values not yet observed
random variables
variables yet to be observed
Lowercase letters
actually observed or hypothetical values
Realizations
values observed (height)
Univariate
one variable measured for each object
Bivariate
two variables measured for each object (height and weight)
Multivariate
more than one variable measured for each object
two branches of statistics
descriptive and inferential
descriptive stats
summarizing and describing features of a set of data (median, average, graphs)
Impt** descriptive stats
describes sample only
Inferential stats
using information in sample to draw conclusions about population or process
Notation: sample size
"n" or "m"
Notation: one sample
X1,X2,X3...Xn
Notation: two samples
X1,X2,X3...Xn
Y1,Y2,Y3...Yn
Variable
quantitative or numerical
Scale
numbers represent some ordering and relative size
Numerical variable: discrete
if set of possible values is finite or can be listed in infinite sequence
Numerical variable: continuous
possible values consist of two or more intervals of real numbers
Order stats
- not in increasing order
- order doesn't usually matter
Notation: S
all possible values
Countably infinite
roll die until it hits 6
Height is...
discrete
Qualitative/categorical
if values are not measured on a numeric scale
Ordinal categorical variable
categorical variable has inherent ordering
Nominal categorical variable
categorical variable having no inherent ordering
"distribution" of a sample
"shape" of the data
Graphs for distribution
-stem-and-leaf
-dotplots
-histograms
Unimodal
distribution with only one peak
Bimodal
distribution with two different peaks
Multimodal
distribution with more than two peaks
Tail of distribution
- portion of distribution that is at the extreme
-where near 0
Heavy/long tail
tail drops SLOWLY toward 0
Light/short tail
tail drops QUICKLY toward 0
Heavy tailed distributions =
more extreme values
Tail: symmetric
left half mirrors right half
Tail: positively/rightly skewed
right tail longer
Tail: negatively/leftly skewed
left tail longer
Graph: bivariate numerical data
scatterplot
2 Common distribution questions
1. Where located?
2. How spread out is it?
Measures of central location
center of distribution
Common central locations
- mean
- median
Sample mean equation:
Sample mean physical interpretation
center of mass
Sample median (x ~)
smallest to largest - middle
Rounding
2 decimal places
population parameters
- population mean and median
- describes population, not sample
sample mean estimates:
population mean
sample median estimates:
population median
Outliers
- no firm universal definition
- observation that doesnt fit in with rest of data
Throw out outliers?
Only if confident they are external
Outliers
feature of population or process
First quartile
25th percentile
Second quartile
- 50th percentile
- sample median
Third quartile
75th percentile
Deciles
10th, 20th... 90th (9 total)
p% trimmed mean
Obtained by removing lower p% and upper p%
3 measures of variability
1. no variable measure = 0
2. measure always >= 0
3. variability increase, measure increase
Range
difference between the largest and smallest sample values. 𝑅=Max-Min
Most commonly used measure of variability
variance
Sample variance equ
s^2
Sample standard deviation
s
Five-number summary
Min, Q1, Q2, Q3, Max
Basic Boxplot
graphical display of five number summary
Boxplot
box and whiskers plot
Modified boxplot
modified boxplot to show outliers
Side-by-side boxplots
boxplots for two or more data sets all plotted using common scale.
Interquartile rage (IRQ)
IQR = Q3 -Q1
