POS3713 Exam 4 FSU 2

Bivariante Regression Hypothesis
An independent variable causes change in the dependent variable
Dependent Variable
Independent Variable
Our estimate of ~a
our estimate of ~B
Ordinary Least Squares
The method of minimizing the sum of the squared errors
OLS is used if
Our dependent Variable is continuous, unbounded and is normally distributed
The Estimate in Bivariate Regression formulas
Rely on the means of the variable and are estimates of the true unknown ~a and ~B in the population
Uncertainty is Reduced
When the samples size increases
Standard Error
Measure of uncertainty
The Smaller the Standard Error
The more confident we are that our estimates are equal to the true values
Uncertainty of ~a
It is not usually directly related to our hypothesis test, it is more important for calculating predictions of the dependent variable
Our Goal in Bivariant Regression
Is to determine if the independent variable affects the dependent variable (uncertainty around ~B)
X2 Test
Compared the observed table to the table we would of seen with no relationship
Bivariant Regression X2 Test
Compare our observed ~B to the value of ~B if there were no relationship
Difference of Means Test
A t-test in which we compared two values and divided the difference by the standard error
Statistical Significance
Looking to reject the null hypothesis
Non Directional Hypothesis
Directional Hypothesis
~B=0 0r that the relationship is in the other direction
Coefficient/ Standard Error of the Coef
Degrees of Freedom
n-# of parameters
Degrees of Freedom For Bivariate Regression
A model does not do a good job of predicting values of the dependent variable
When none of the actual observed points fall on that line
OLS Assumption in Linearity
A straight line adequately represents the relationship in the population. Fitting a linear model to a nonlinear relationship results in biased estimates
OLS Assumptions in Independent Observations
The values of the dependent variable are independent of each other. The estimates are unbiased, but the standard errors are typically biased downwards (mistakenly reject the null hypothesis)
Things to Watch Out For in OLS Models
Linearity, Outliers, Leverage, and Influence
OLS assumes the relationship between the independent variable and dependent variable is linear
When a case has an unusual Y value given its X value
When a case has an unusual X value (not always bad)
A case that is both an outlier and has leverage is said to do this to a regression line. This effects both the constant and the slope
Regression Model
Allows us to predict the values of the dependent variable based on the value of an independent variable
The difference between the the actual value and the predicted value
Goodness of Fit
How well the model predicts the dependent variable
Smaller Residuals
Better the goodness of fit
Two Measures of model fit
Root Mean Squared Error & R2
Root Mean Squared Error
Measure of the typical deviations from the regression line
The number of parameters (2 for bivariate regression)
The proportion of the variance in the dependent variable that our model explains (range= 0-1)
The Closer R2 is to 1
The more of the variation our model explains and the better our model is at predicting the dependent variable
Calculating R2
Regression sum of squares/ total sum of squares
Calculating Regression Sum of Squares
Total sum of squares - residual sum of squares
The Size of R2
Most important when we are trying to build the model that is the most predictive
Quasi Experiment
Experiments <- random assignment
Goal of Multiple Regression
Can we infer that our independent variable causes our dependent variable (multiple hypothesis)
Is there another factor that you're not considering? X<-Z->Y
Controlling for spuriousness
Including Z in Model
Allows us to examinee the effect of X holding Z constant
Formula for ~B1
Uses the variation in X and Y that cannot be explained by Z
The effect of X on Y controlling for Z
The effect of Z on Y controlling for X
How do we interpret the coefficient estimates?
Partial effect of the variable and effect holding all other variables constant
With several independent variable how do you calculate predicted values?
Vary one variable and hold others at some constant value
Dummy Variable
A binary variable that represents absence (0) or presence (1) of a characteristic
Why do we need a reference category?
In orde to estimate coefficient, variables cannot be "collinear"
How do we select a reference category?
It should be driven by the most interesting comparisons, but it is ultimately up to the researcher
How many summer variables do you include?
The number of categories minus 1
Types of Linear Probability Models
Probit and Logit
Linear Probability Models
Dependent Variable= Categorical
Independent Variable= Continuos
Assumptions of OLS Model: Normality
The errors are normally distributed and this also implies that the dependent variable is normally distributed
Our Dependent Variable must be
Interval level, continuous, and unbounded
Meets the dependent variable assumptions and OLS does not work
Why OLS does not work for dichotomous?
Impossible probabilities and the straight line does not really fit the data
When OLS assumptions fail
We can transform the typical OLS model to estimate models that take into account the distribution of the dependent variable
Use a Z test and used with large sample sizes
The Substantive meaning of the coefficient is different in each type of regression