# Methods-Stats

## 50 terms

### Provide the formula for binary regression and label the parts.

Y=a+bX
Y=dependent variable
a=intercept
b=slope
X=independent variable

### Provide the formula for 95% confidence interval and label the parts.

x ̅±1.96×SE
x ̅=mean
1.96= z-score associated with 95% CI
SE=(std.dev)/√sample=standard error

### Mode

The most frequently occurring case.

### Median

Score in a rank-ordered distribution of scores that is exactly in the middle, such that 50% are high and 50% are lower.

### Mean

The average of a set of scores.

### Range

Shows the full dispersion of numbers in the sample.

### Variance

Demonstrates the variation between all of the cases in the sample.

### Standard Deviation

Displays the average distance of the cases in the sample to the mean of the sample.

### Levels of Measurement

Nominal: numbers assigned to a set of categories (gender, race, religion)
Ordinal: numbers assigned to rank-ordered categories
Interval: assumption of equal distance between objects on scale
Ratio: variables that have a natural zero (weight, length, temperature)

### Inferential Statistics

Concerned with making predictions or inferences about a population from observation and analyses of a sample.

### Negatively skewed distribution

A distribution with few extremely low values

### Positively skewed distribution

A distribution with few extremely high values

### Interquartile Range (IQR)

The width of the middle 50% of the distribution, It is defined as the difference between the lower and upper quartiles.

### Normal distribution

A bell-shaped and symmetrical theoretical distribution with the mean, the median, and the mode- all coinciding at its peak and with the frequencies gradually decreasing at both ends of the curve.

### Standard (Z) score

The number of standard deviations that a given raw score is above or below the mean.

### Sampling Error

The discrepancy between a sample estimate of a population parameter and the real population parameter.

### Chronbach's Alpha

Should be .7 if alpha is negative there is an inverse relationship between the variables.

### How to improve reliability

1.Make sure the instructions are standardizes and clear across settings. 2. Increase the # of items or observations. 3.Delete unclear items. 4.Moderate easiness and difficulty of tests. 5.Minimize the effect of external events.

### How to find percentile rank

1.convert the raw score to a z score. 2.Find the area beyond z in the standard normal table. 3.Subtract the area from 1.00 for the percentile and multiply by 100 for the percent.

### Central Limits Theorem

If all possible random samples of size N are drawn from a population with a mean and a standard deviation, then as N becomes larger, the sampling distribution approaches normal. With sufficient sample size, the distribution will be normal regardless of the shape.

### What is the 95% confidence interval

There is a 95% probability that a specified interval will contain the population mean. Less confident, but more precise than 99%.

### Steps of Hypothesis Testing

1. Making assumptions. 2.Stating hypothesis. 3.Selecting a test statistic. 4.Computing the test statistic. 5.Interpretin the results.

### Z vs. T

We use the Z statistic when we know the population variance, however, in most situations we don't have this info, so we rely on the T statistic where we estimate population parameters using info from the sample.

### Alpha

The level of probability at which the null hypothesis is rejected. Usually set at .05.

### Type 1 Error

Reject the null and it is true.

### Type 2 Error

So not reject null, but it is false.

### T-Test

Appropriate when you have a single interval DV and a dichotomous IV and want to test the difference or compare means. Used when sample sizes are small.

### Independent Sample T-Test

Used to compare the means of two independent sampled groups.

### Paired Sample T-Test

Compares means when two groups are correlated, as in before-after, repeated measures, etc.

### T-Test Formula

If t-value is greater than 1.96 the difference between the means is significantly different than zero.
The difference between two means divided by the estimated standard error of the difference.

### ANOVA

Designed to test for significant relationship between two variables in two or more samples.
Assumes independent random sampling the DV is interval-ratio, population is normally distributed and the variances are equal.

### F-statistic

Ratio of between group variance to within group variance.
F=(SSB/〖df〗_b)/(SSW/〖df〗_w )=(mean square between)/(mean square within)

### F-critical

The f-score associated with a particular alpha level and df.
If obtained is more than critical, reject the null. If obtained is less than critical, accept the null.

### F-obtained

Test statistic computed by the ratio for between-group to within-group variance.

### Effect size for ANOVA

〖Eta〗^2=SSB/SST
Closer it is to 1, the better.

### Correlations

.2=weak, .5=med, .8=strong

### Bivariate Regression

One IV and one DV, ratio or interval only

### Pearson's Correlation Coefficient (r)

Ranges from -1 to +1, the sign indicates the direction, the closer to 1 the stronger the association between x and y.

### Coefficient of determination (r^2)

Proportion of the total variation in the DV as explained by the IV.
(regression sum of squares)/(total sum of squares)

### Regression Assumptions

No specification error.
No measurement error.
Error Terms have zero mean.
Homoskedasticity.
No autocorrelation.
IV uncorrelated with error term.
Error term normally distributed.
No perfect multicollinearity.

### No specification error.

The relationship between X and Y is linear.
No relevant IV have been excluded.
No irrelevant IV have been included.
If we exclude relevant IV then we violate the assumption it is different from the error term. When we include irrelevant IV we get higher variances of estimated coefficients.

### No Measurement Error.

The variables X and Y are accurately measured.
DV must be numeric, continuous (interval or ratio) and unbounded. IV must be numeric, continuous or dichotomous. If this is violated it will be inefficient.

### Error Terms.

Error terms should be equal to zero. Consequences if not: biased intercept, unbiased coefficient.

### Homoscedasticity

Variance of the error term around the regression line is constant.
Pure heteroskedasticity- not constant, wide range between largest and smallest value.
Impure heteroskedasticity- left out IV that are important or included IV that are irrelevant.
Consequences: no bias in coefficient estimates, does not have minimum variance. Causes OLS to underestimate variances and SE.

### No Autocorrelation

The error terms are uncorrelated.
Pure serial correlation-occurs in correctly specified models.
Impure serial correlation-specification error, underestimates SE and increases chances of rejecting null when you shouldn't.

### The IV is uncorrelated with the error term.

Consequences:
Biased intercept
Biased Coefficient

### The error term is normally distributed

Consequences:
Significance tests will be invalid.
No perfect multicollinearity- IV correlated to DV, not to each other.
Perfect multicollinearity-won't run.
Severe multicollinearity- regression will run and might not violate.

### NO Multicollinearity.

Consequences:
Estimates unbiased.
Variance and SE increase.
t-scores are insignificant.
Estimates will become sensitive to changes in specification.
Overall fit will be unaffected.
The worse the multicollinearity, the worse the consequences will be.

### ZOCMAOTIV

Zero order correlation matrix among only the independent variables.
Looks at multicollinearity, .7 or higher not good. State by saying "Based on Pearson's R, the significant relationship between the IV and the DV is (moderate, high, weak) and (positive, negative)

### What to interpret

Is F significant? At what level?
Report r^2, how much the IV accounts for the variance in the DV in a %.
b coefficient-sig. only, "more likely" or "less likely". Include "when all other IV are held constant."
B4 (Purpose of b coefficient)
converts IV units into DV units.
converts scale of IV into DV.
Minimizes SSE.
Provides and estimate.
BIGGER THE BETA IS BETTER AN ABSOLUTE VALUE OF COURSE.