Bivariante Regression Hypothesis

An independent variable causes change in the dependent variable

Y

Dependent Variable

X

Independent Variable

~a

Constant

~B

Coefficient

a

Our estimate of ~a

b

our estimate of ~B

Ordinary Least Squares

The method of minimizing the sum of the squared errors

OLS is used if

Our dependent Variable is continuous, unbounded and is normally distributed

The Estimate in Bivariate Regression formulas

Rely on the means of the variable and are estimates of the true unknown ~a and ~B in the population

Uncertainty is Reduced

When the samples size increases

Standard Error

Measure of uncertainty

The Smaller the Standard Error

The more confident we are that our estimates are equal to the true values

Uncertainty of ~a

It is not usually directly related to our hypothesis test, it is more important for calculating predictions of the dependent variable

Our Goal in Bivariant Regression

Is to determine if the independent variable affects the dependent variable (uncertainty around ~B)

X2 Test

Compared the observed table to the table we would of seen with no relationship

Bivariant Regression X2 Test

Compare our observed ~B to the value of ~B if there were no relationship

Difference of Means Test

A t-test in which we compared two values and divided the difference by the standard error

Statistical Significance

Looking to reject the null hypothesis

Non Directional Hypothesis

~B=0

Directional Hypothesis

~B=0 0r that the relationship is in the other direction

t

Coefficient/ Standard Error of the Coef

Degrees of Freedom

n-# of parameters

Degrees of Freedom For Bivariate Regression

n-2

A model does not do a good job of predicting values of the dependent variable

When none of the actual observed points fall on that line

OLS Assumption in Linearity

A straight line adequately represents the relationship in the population. Fitting a linear model to a nonlinear relationship results in biased estimates

OLS Assumptions in Independent Observations

The values of the dependent variable are independent of each other. The estimates are unbiased, but the standard errors are typically biased downwards (mistakenly reject the null hypothesis)

Things to Watch Out For in OLS Models

Linearity, Outliers, Leverage, and Influence

Linearity

OLS assumes the relationship between the independent variable and dependent variable is linear

Outliers

When a case has an unusual Y value given its X value

Leverage

When a case has an unusual X value (not always bad)

Influence

A case that is both an outlier and has leverage is said to do this to a regression line. This effects both the constant and the slope

Regression Model

Allows us to predict the values of the dependent variable based on the value of an independent variable

Residuals

The difference between the the actual value and the predicted value

Goodness of Fit

How well the model predicts the dependent variable

Smaller Residuals

Better the goodness of fit

Two Measures of model fit

Root Mean Squared Error & R2

Root Mean Squared Error

Measure of the typical deviations from the regression line

K

The number of parameters (2 for bivariate regression)

R2

The proportion of the variance in the dependent variable that our model explains (range= 0-1)

The Closer R2 is to 1

The more of the variation our model explains and the better our model is at predicting the dependent variable

Calculating R2

Regression sum of squares/ total sum of squares

Calculating Regression Sum of Squares

Total sum of squares - residual sum of squares

The Size of R2

Most important when we are trying to build the model that is the most predictive

Quasi Experiment

Experiments <- random assignment

Goal of Multiple Regression

Can we infer that our independent variable causes our dependent variable (multiple hypothesis)

Spuriousness

Is there another factor that you're not considering? X<-Z->Y

Z

Controlling for spuriousness

Including Z in Model

Allows us to examinee the effect of X holding Z constant

Formula for ~B1

Uses the variation in X and Y that cannot be explained by Z

~B1

The effect of X on Y controlling for Z

~B2

The effect of Z on Y controlling for X

How do we interpret the coefficient estimates?

Partial effect of the variable and effect holding all other variables constant

With several independent variable how do you calculate predicted values?

Vary one variable and hold others at some constant value

Dummy Variable

A binary variable that represents absence (0) or presence (1) of a characteristic

Why do we need a reference category?

In orde to estimate coefficient, variables cannot be "collinear"

How do we select a reference category?

It should be driven by the most interesting comparisons, but it is ultimately up to the researcher

How many summer variables do you include?

The number of categories minus 1

Types of Linear Probability Models

Probit and Logit

Linear Probability Models

Dependent Variable= Categorical

Independent Variable= Continuos

Independent Variable= Continuos

Assumptions of OLS Model: Normality

The errors are normally distributed and this also implies that the dependent variable is normally distributed

Our Dependent Variable must be

Interval level, continuous, and unbounded

Dichotomous

Meets the dependent variable assumptions and OLS does not work

Why OLS does not work for dichotomous?

Impossible probabilities and the straight line does not really fit the data

When OLS assumptions fail

We can transform the typical OLS model to estimate models that take into account the distribution of the dependent variable

Logit

Use a Z test and used with large sample sizes

NB

The Substantive meaning of the coefficient is different in each type of regression