Provide the formula for binary regression and label the parts.
Provide the formula for 95% confidence interval and label the parts.
1.96= z-score associated with 95% CI
The most frequently occurring case.
Score in a rank-ordered distribution of scores that is exactly in the middle, such that 50% are high and 50% are lower.
The average of a set of scores.
Shows the full dispersion of numbers in the sample.
Demonstrates the variation between all of the cases in the sample.
Displays the average distance of the cases in the sample to the mean of the sample.
Levels of Measurement
Nominal: numbers assigned to a set of categories (gender, race, religion)
Ordinal: numbers assigned to rank-ordered categories
Interval: assumption of equal distance between objects on scale
Ratio: variables that have a natural zero (weight, length, temperature)
Concerned with making predictions or inferences about a population from observation and analyses of a sample.
Negatively skewed distribution
A distribution with few extremely low values
Positively skewed distribution
A distribution with few extremely high values
Interquartile Range (IQR)
The width of the middle 50% of the distribution, It is defined as the difference between the lower and upper quartiles.
A bell-shaped and symmetrical theoretical distribution with the mean, the median, and the mode- all coinciding at its peak and with the frequencies gradually decreasing at both ends of the curve.
Standard (Z) score
The number of standard deviations that a given raw score is above or below the mean.
The discrepancy between a sample estimate of a population parameter and the real population parameter.
Should be .7 if alpha is negative there is an inverse relationship between the variables.
How to improve reliability
1.Make sure the instructions are standardizes and clear across settings. 2. Increase the # of items or observations. 3.Delete unclear items. 4.Moderate easiness and difficulty of tests. 5.Minimize the effect of external events.
How to find percentile rank
1.convert the raw score to a z score. 2.Find the area beyond z in the standard normal table. 3.Subtract the area from 1.00 for the percentile and multiply by 100 for the percent.
Central Limits Theorem
If all possible random samples of size N are drawn from a population with a mean and a standard deviation, then as N becomes larger, the sampling distribution approaches normal. With sufficient sample size, the distribution will be normal regardless of the shape.
What is the 95% confidence interval
There is a 95% probability that a specified interval will contain the population mean. Less confident, but more precise than 99%.
Steps of Hypothesis Testing
1. Making assumptions. 2.Stating hypothesis. 3.Selecting a test statistic. 4.Computing the test statistic. 5.Interpretin the results.
Z vs. T
We use the Z statistic when we know the population variance, however, in most situations we don't have this info, so we rely on the T statistic where we estimate population parameters using info from the sample.
The level of probability at which the null hypothesis is rejected. Usually set at .05.
Type 1 Error
Reject the null and it is true.
Type 2 Error
So not reject null, but it is false.
Appropriate when you have a single interval DV and a dichotomous IV and want to test the difference or compare means. Used when sample sizes are small.
Independent Sample T-Test
Used to compare the means of two independent sampled groups.
Paired Sample T-Test
Compares means when two groups are correlated, as in before-after, repeated measures, etc.
If t-value is greater than 1.96 the difference between the means is significantly different than zero.
The difference between two means divided by the estimated standard error of the difference.
Designed to test for significant relationship between two variables in two or more samples.
Assumes independent random sampling the DV is interval-ratio, population is normally distributed and the variances are equal.
Ratio of between group variance to within group variance.
F=(SSB/〖df〗_b)/(SSW/〖df〗_w )=(mean square between)/(mean square within)
The f-score associated with a particular alpha level and df.
If obtained is more than critical, reject the null. If obtained is less than critical, accept the null.
Test statistic computed by the ratio for between-group to within-group variance.
Effect size for ANOVA
Closer it is to 1, the better.
.2=weak, .5=med, .8=strong
One IV and one DV, ratio or interval only
Pearson's Correlation Coefficient (r)
Ranges from -1 to +1, the sign indicates the direction, the closer to 1 the stronger the association between x and y.
Coefficient of determination (r^2)
Proportion of the total variation in the DV as explained by the IV.
(regression sum of squares)/(total sum of squares)
No specification error.
No measurement error.
Error Terms have zero mean.
IV uncorrelated with error term.
Error term normally distributed.
No perfect multicollinearity.
No specification error.
The relationship between X and Y is linear.
No relevant IV have been excluded.
No irrelevant IV have been included.
If we exclude relevant IV then we violate the assumption it is different from the error term. When we include irrelevant IV we get higher variances of estimated coefficients.
No Measurement Error.
The variables X and Y are accurately measured.
DV must be numeric, continuous (interval or ratio) and unbounded. IV must be numeric, continuous or dichotomous. If this is violated it will be inefficient.
Error terms should be equal to zero. Consequences if not: biased intercept, unbiased coefficient.
Variance of the error term around the regression line is constant.
Pure heteroskedasticity- not constant, wide range between largest and smallest value.
Impure heteroskedasticity- left out IV that are important or included IV that are irrelevant.
Consequences: no bias in coefficient estimates, does not have minimum variance. Causes OLS to underestimate variances and SE.
The error terms are uncorrelated.
Pure serial correlation-occurs in correctly specified models.
Impure serial correlation-specification error, underestimates SE and increases chances of rejecting null when you shouldn't.
The IV is uncorrelated with the error term.
The error term is normally distributed
Significance tests will be invalid.
No perfect multicollinearity- IV correlated to DV, not to each other.
Perfect multicollinearity-won't run.
Severe multicollinearity- regression will run and might not violate.
Variance and SE increase.
t-scores are insignificant.
Estimates will become sensitive to changes in specification.
Overall fit will be unaffected.
The worse the multicollinearity, the worse the consequences will be.
Zero order correlation matrix among only the independent variables.
Looks at multicollinearity, .7 or higher not good. State by saying "Based on Pearson's R, the significant relationship between the IV and the DV is (moderate, high, weak) and (positive, negative)
What to interpret
Is F significant? At what level?
Report r^2, how much the IV accounts for the variance in the DV in a %.
b coefficient-sig. only, "more likely" or "less likely". Include "when all other IV are held constant."
B4 (Purpose of b coefficient)
converts IV units into DV units.
converts scale of IV into DV.
Provides and estimate.
BIGGER THE BETA IS BETTER AN ABSOLUTE VALUE OF COURSE.