Home
Subjects
Create
Search
Log in
Sign up
Upgrade to remove ads
Only $2.99/month
Stats Final
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (108)
Type I Error
False Positive
Reject the null
Type II Error
False Negative
Fail to reject the null
Paired Data Condition is specific to
Paired Sample T Test
Similar spread is specific to
ANOVA
Post-hoc is required in ANOVA when you ______ the null
Reject
When to use correlation test
To test a relationship between two quantitative variables
When to use ONE Proportion Z Test
To test a claim about a single proportion
When to use TWO proportion Z Test
To test the difference between two proportions
When to use Chi Square Goodness of Fit
To see if the distribution of a categorical variable follows a known/ proposed distribution
When to use Chi Square Homogeneity
To test if the distribution of a categorical variable is the same across groups
When to use One Sample T-Test
To test the difference between the means of two paired/ dependent groups
When to use ANOVA
To compare the means of more than two independent groups
Name the test: Is there a relationship between hours of sleep per night and exam scores?
Correlation
Name the test: Predict the final exam score of a student who earned a 75 on the first exam
Regression
Name the test: To test Mars' claim that 13% of their M&Ms are red, a bag of 122 M&Ms were purchased and 22 were red. Is the company's claim incorrect?
One Proportion Z-Test
Name the test: Do sky ads have a larger click-through rate than leaderboard ads?
Two Proportion Z-Test
Name the test: Mars claims that each bag of M&Ms has 24% blue, 20% orange, 16% green, 14% yellow, 13% red and 13% brown. A bag of 200 M&Ms was purchased that contained 45 blue, 35 orange, 32 green, 27 yellow, 25 red and 36 brown. Test Mars' claim.
Chi Square Goodness of Fit
Name the test: Are personality (introvert/extrovert) and color preference related?
Chi Square Independence
Name the test: Is there a difference in distribution in ad recall among three kinds of magazine ads?
Chi Square Homogeneity
Name the test: Is there evidence supporting subway's claim that their mean footlong sandwich length is equal to 12 inches?
One Sample T-Test
Name the test: Do consumers believe that a product is healthier if the nutrition label is green, compared to a red label?
Independent Samples T-Test
Name the test: Do women, on average, pay more for personal care products than men?
Paired Samples T-Test
Name the test: Is weight loss related to weight loss program? 4 weight loss programs were used in this experiment
ANOVA
Observations such as test scores, grades, or response times
Data
Something that can vary and tale on different values
Variable
Type of variable where the values are numerical measurements
Quantitative
Type of variable where the values designate different categories
Categorical
The set of all scores of a particular variable
Distribution
All the possible subjects or cases of interest
Population
Selection of cases from a population of interest
Sample
Assigning a value to data to show which category it belongs to
Coding
The number of times a value is observed in a dataset
Frequency
A proportion (number of observations out of the total observations
Relative Frequency
A distribution in which the left side mirrors the right side
Symmetric
A distribution with a tail that extends to the right
Right skewed
A distribution with one "mound"
Unimodal
A distribution with two "mounds"
Bimodal
A graphical display for categorical data, with data grouped in categories
Bar graph
A graphical display of quantitative data, with data grouped in bins
Histogram
The interval used to group quantitative data
Bin
Middle value in a data set; half the scores are above it and half are below it
Median
Numerical average of a set of scores
Mean
Difference between the highest and lowest scores in a dataset
Range
Divides a distribution into four parts with an equal number of data points
Quartiles
Data value such as "i" percent of data lies below that value
"i"th percentile
The middle 50% of scores in a distribution are found here
Interquartile range
Sum of squares deviations from the mean divided by (n-1)
Variance
Square root of the variance ("typical" distance of a score from the mean)
Standard deviation
Extreme value that does not appear to belong with the rest of the data
Outlier
Useful family of models for unimodal, symmetric distributions
Normal model
Tells how many standard deviations of a value from the mean
Z-score
A normal model with mean mu=0 and standard deviation sigma=1
Standard Normal Model
Met if the distribution is unimodal and symmetric
Nearly normal condition
Empirical Rule
68-95-99.7 Rule
A measure of the strength and direction of a linear relationship between two quantitative variables
Scatterplot
A display of the relationship between two quantitative variables
Correlation
Term signifying the linear relationship between two quantitative variables
r
As the values of one variable increase, those of another tend to increase as well
Positive Relationship
A decrease in one variable tends to accompany an increase in the other
Negative Relationship
An extreme value that can product an unreliable correlation statistic
Outlier
A variable other than x and y that simultaneously affects both variables, accounting for the correlation between the two`
Lurking variable
Value on the fitted line
Predicted value
The difference between the data value and the corresponding value predicted by the regression model
Residual
Gives the fraction of the variability of the y accounted for by the least squares linear regression on x
Coefficient of determination
Predictions that occur outside the range of the data; not as reliable because we don't know what happens to the trend
Extrapolation
Data points whose x values are far from the means of x
High leverage
A point that, if ommitted from the data, results in a very different regression model
Influential point
Numerical value for population data
Parameter
Drawing conclusions about a population from a sample
Statistical inference
Distribution of the statistic over all possible samples
Sampling distribution
Sample-to sample variation
Sampling variability (error)
Probability of attaining the test statistic or one more extreme in the direction of the alternative hypothesis given the null hypothesis is true
P-Value
Used when interested in deviations in only one direction away from the hypothesized parameter value
One-Sided Test
Used when interested in deviations in either direction away from the hypothesized parameter value
Two-Sided Test
To conclude that your sample does not come from the null distribution
Reject the null
To conclude that your sample could come from the null distribution
Fail to reject the null
Highest chi square you can expect by chance alone
Critical Value
Interval likely to contain the true population parameter
Confidence interval
Had the width of a confidence interval
Margin of error
Number of value in the final calculation of a statistic that are allowed to vary
Degrees of freedom
Likelihood of observing the calculated CHI SQUARE statistic, when the model fits the data
P-Value
You know what level the people are on when you ask them about the other categorical variable
Chi Square Test of Homogeneity
You have to ask people about both categorical variables
Chi Square Test of Independence
Interested in testing if a single categorical variable is related to some categorical variable that has already been proposed/ collected
Chi Square Goodness of Fit
Distribution of a variable by itself
Marginal distribution
How many peaks, symmetry, outliers
Shape
Typical value of the distribution
Center
Tells you if data hovers around center or are spread out
Spread
Unimodal, symmetric, no outliers
Normal distribution
You can summarize the relationship with a straight line
Linear correlation
Line that summarizes the relationship between 2 variables; allows you to predict the dependent variable from the independent variable; also called the line of best fit
Regression line
Where the smallest sum of squares is
Line of best fit
Allows you to test a claim about a parameter
Hypothesis Test
Natural threshold for what is and is not in natural sampling variability; also the probability of committing a Type I Error
Alpha Level (Significance Level)
Business as usual/ nothing has changed
Null Hypothesis
The research hypothesis/ what you are trying to show
Alternative hypothesis
Sample statistic standardized used to get probabilities
Test statistic
Not specifically interested in greater/less than; just to see if different
Two tailed test
Specifically interested in greater than/ less than
One tailed test
The probability of committing a Type 2 error
Beta
Probability of rejecting the null hypothesis when the null is false
Power
The sampling distribution will be normal regardless of the population distribution, assuming the assumptions / conditions are met
Central Limit Theorem
Analysis of variance
ANOVA
How much each mean varies around the grand mean
Mean square between
How much each observation varies around the group mean
Mean square within
How much each observation varies around the grand mean
Total variability
Tells us which means differ if the 1 way ANOVA indicates that the means differ
Post Hoc Analysis
Making pairwise comparisons of the means
Bonferroni Adjustment
YOU MIGHT ALSO LIKE...
QTM-100 Final Fall 2014
77 terms
Stats Final
80 terms
Statistics for Data Analytics
60 terms
biostats midterm 2
50 terms