Unit 1 (Chapters 1 - 5) Vocab
DATA
Recorded Information
CONTEXT
The W's (Who, What, When, Where, Why, hoW)
CASE
The "Who", who we have data on
RESPONDENT
A person that answers a survey
SUBJECT
A person that participates in an experiment
PARTICIPANT
A person that participates in an experiment
EXPERIMENTAL UNIT
An animal, plant, etc. that is used in an experiment
RECORD
Information about each individual
VARIABLE
The "What"
SAMPLE
The cases that were chosen from the larger population
POPULATION
All the cases we wished we knew about
IDENTIFIER VARIABLE
A variable used to name a case (ID #)
CATEGORICAL VARIABLE
A variable that has categories as answers
QUANTITATIVE VARIABLE
A variable that has amounts as answers
AREA PRINCIPLE
The area of the graph should equal the magnitude of the data it is representing
BAR CHART
Uses bars to compare the difference of categorical data
CATEGORICAL DATA CONDITION
Data must be counts or percentages
CONDITIONAL DISTRIBUTION
The probability distribution of a sub-population. It forces you to focus on one group (not entire population)
CONTINGENCY TABLE
A two-way table that shows counts of categorical variables
FREQUENCY TABLE
A table that displays counts for categorical data
INDEPENDENCE
The occurrence of one event does not affect the other event
MARGINAL DISTRIBUTION
The distribution of an entire category
PIE CHART
Circle graph for categorical data
RELATIVE FREQUENCY BAR CHART
Use bars to compare the relative frequency of subgroups
RELATIVE FREQUENCY TABLE
The percent that each subgroup is out of the total count in that subgroup
SEGMENTED BAR CHART
Bar graphs that represent comparisons among categories (stacked bar graph)
SIMPSON'S PARADOX
A trend that appears in different groups but then the trend disappears when the groups are combined
HISTOGRAM
Display used with quantitative that has adjacent bars for each bin
RELATIVE FREQUENCY HISTOGRAM
Uses the relative frequency of quantitative variables and places them into adjacent bars for each bin
GAP
A region of the distribution that has no values
STEM-AND-LEAF DISPLAY
Displays quantitative data values in a way that sketches the distribution
DOTPLOT
Display that graphs a dot for each case against a single axis
SHAPE
Must describe the type of modes, the symmetry or skewness, and the outliers or gaps.
CENTER
The mean or the median
SPREAD
The IQR or Standard Deviation
MODE
A hump or local high point
UNIMODAL
One mode (one hump)
BIMODAL
Two modes (two humps)
MULTIMODAL
More than one mode (multiple humps)
UNIFORM
Distribution that is relative flat (no modes or humps)
SYMMETRIC
When a distribution has two halves on opposite sides of the center that look like mirror images
TAILS
The part of the distribution that trails off on either side
SKEWED
If a distribution is not symmetric and there is a tail that goes off in one direction
OUTLIERS
Extreme values that don't appear to belong with the rest of the data
INTERQUARTILE RANGE
Q3 - Q1
PERCENTILE
The number that falls above the nth% of the data
5-NUMBER SUMMARY
Minimum, Q1, Median, Q3, Maximum
BOXPLOT
A display for the 5-Number Summary and outliers
MEAN
The average, paired with the Standard Deviation
RESISTANT
If outliers have small effect on the calculated summary
STANDARD DEVIATION
average distance from the mean
VARIANCE
The sum of the squared deviations from the mean
COMPARING DISTRIBUTIONS
Shape, Center, Spread (SOCS)
TIMEPLOT
Displays data that change over time to show long-term patterns
RE-EXPRESS
Apply a simple function to data to make data more symmetric or linear
68-95-99.7 RULE
68% within 1 SD, 95% within 2 SD, and 99.7% within 3SD
NEARLY NORMAL CONDITION
the histogram of the data is unimodal and symmetric
NORMAL MODEL
Models for unimodal & symmetric distributions
NORMAL PERCENTILE
The percent of values in a Standard Normal Distribution at the z-score or below it
NORMAL PROBABILITY PLOT
display that helps assess whether a distribution is approximately normal
NORMALITY ASSUMPTION
Have to believe a variable's distribution is "Normal"
PARAMETER
a numerical value attributed to the model (population)
RESCALE
Multiplying & Dividing to values (Everything changes)
SHIFTING
Adding & Subtracting to values (Spread stays the same)
STANDARD NORMAL MODEL
A normal model with mean = 0 and standard deviation = 1
STATISTIC
a value calculated from data to summarize aspects of the data
Z-SCORE
how many standard deviations a value is from the mean
