What does the OLS estimate minimize?
The sum of the squared prediction mistakes over n observations. nEi=1 (Yi - b0 - b1Xi)ˆ2
How do we obtain is the OLS estimator of Bhat1?
1) Regress Y on X2 including intercept. Residuals are Yres. 2) Regress X1 on X2 including intercept. Residuals are Xres. 3) Estimator is En1=i (Xres, Yres)/ E (Xres)ˆ2
What is the regression Rˆ2? What is it mathematically (2)?
A number between 0 and 1 which is the fraction of sample variance of Yi explained by (or predicted by) Xi. Mathematically, it is the explained sum of squares (ESS) (Yhat - Ybar)ˆ2, or the squared deviations of the predicted values of Yi, Yihat from their average over the total sum of squares (TSS) (Yi - Ybar)ˆ2 or the sum of squared deviations from Y. Rˆ2 = 1 - SSR/TSS
What are the 3 OLS assumptions? What happens if they hold?
1) E(ui given Xi) = 0. Conditional means assumption. 2) i.i.d. 3) Large outliers are unlikely. The estimators B0 and B1 are unbiased, consistent, and normally distributed when the sample is large.
What are the null and two-sided alternative hypotheses of Bi? How is tested differently than a one sided (greater than)?
H0: B1 = B1,0 vs. H1: B1 =/ B1,0. We reject against the two sided for an large tstat either way, instead of one kind of large t on on one side. We reject is abs(tact) > 1.96 for 5% significance.
What are the three steps of hypothesis testing B1?
1) Compute the standard error of B1. 2) Compute the t-statistic. 3) Compute the p-value. Reject at 5% if the p-value is less than 0.05, or if t is greater than 1.96. The SE, t and p are typically computed automatically by regression software.
What are the null and one-sided alternative for hypotheses of B1? How is it tested differently from 2 sided?
H0: B1 = B1,0 vs. H1:B1<B1,0. We reject against the one-sided alternative for a sort of large negative but not large positive t. Instead of rejecting if abs(tact) > 1.96, the hypothesis is rejected at the 5% significance level if tact < -1.645.
How to construct a confidence interval for B1 at 99, 95, and 90%?
[Bhat1 - ___SEBhat1, Bhat1 + ____ SEBhat1) 2.58, 1.96, 1.64
What is the population regression model with dummy variable 1 as the regressor? How is B1 different here than when X is a continuous variable? What is B0 here?
Yi = B0 + B1Di + ui. We can no longer speak of B1 as the slope, instead we call is the coefficient multiplying D1 or the coefficient on D1. B0 is the average value of Y when Di = 0.
What is heteroskedasticity?
When the variance of the conditional distribution of ui given Xi is varies for i=1 ...n and depends on Xi.
What is omitted variable bias?
When the regressor is correlated with a variable that has been omitted from the analysis and that determines in part, the dependent variable.
What assumption does omitted variable bias violate?
The first least squares assumption E(ui given X) = 0. Because is one of the factors in the error term is correlated with x...
What are B0 and B1 in the population regression line with multiple regressors?
B0 is the expected value of Y when all Xes equal 0. B1 is the expected change Yi resulting from changing Xi by one unit, holding constant X2...Xn.
What is the residual? How does is differ from the statistical error?
A residual (or fitting error), on the other hand, is an observable estimate of the unobservable statistical error. Consider the previous example with men's heights and suppose we have a random sample of n people. The sample mean could serve as a good estimator of the population mean. Then we have:
The difference between the height of each man in the sample and the unobservable population mean is a statistical error, whereas
The difference between the height of each man in the sample and the observable sample mean is a residual.
Note that the sum of the residuals within a random sample is necessarily zero, and thus the residuals are necessarily not independent. The statistical errors on the other hand are independent, and their sum within the random sample is almost surely not zero.
What is the adjusted Rˆ2?
Deflates the Rˆ2 by adding a "penalty" for each additional variable which doesn't improve our model.
What is the fourth additional assumption for multiple regression?
No perfect multicolinearily (trivial relationship).
What is "the dummy variable trap"?
If there are G binary variables, if each observations falls into one and only one category, if there is an intercept in the regression and if all G binacy variables are included as regressors, than the regression will fail because of perfect multicollinearity.
What do the regression F-test hypotheses look like?
H1: At least one restriction does not hold.
What is the F-stat heteroskedastic-robust formula?
F = [(Rˆ2u - Rˆ2r)/q] / [(1-Rˆ2u)/(n-Ru-1)], where u stands for unrestricted (run with all the Xes) and r stands for restricted (just the Y and the X we are testing)
When we need the SSRr to calculate the homoskedastic F-stat formula, how do we find it?
1) We use the TSSu, because The total sum of squares in the restricted and the unrestricted model will always be the same.
2) Use the SSR from the model where we use only the regressor in our null hypothesis which is not equal to zero.
Why does SSRr=TSSr?
Rˆ2r= 1 - SSRr/TSSr. 0 = 1- SSRr/TSSr. 1= SSRr =TSSr. SSRr = TSSr when Rˆ2r =0, because we have explained none of the model.
What is the homoskedasticity-assumed F-stat formula? You must use outputs which are for sure heteroskedastic.
F=[(SSRr - SSRu)/q] / (SSRu)/(n-Ru-1)
What happens when we assume homoskedasticity in a heteroskedastitic model?
Our estimate of B1 and B0 will not be biased, Standard errors and too small, so our f-statistic will be too large, and we may overeject.
If B1~ is the single regression estimate of y on X1, and Bhat1 is the regression estimate of y or X1, and X1 is highly correlated with X2 and X3, and X2 and X3 have large effects on Y, would you expect B1~ and Bhat1 to be similar?
No, because we have a violation of the conditional means assumption.
We are concerned about omitted variable if... (2)
1) omitted variable affects y (part of u1) 2) and if its correlated with X1. (Because then we are violating the conditional means assumption).
If B1~ is the single regression estimate of y on X1, and Bhat1 is the regression estimate of y or X1, and X1 is not correlated with X2 and X3, and X2 and X3 have large effects on Y and are correlated with each other, would you expect B1~ and Bhat1 to be similar?
No, because X2 and X3 must be correlated with X1 to effect our estimator of B1.
What is the omitted variable bias formula?
B1~ = Bhat1 + Bhat2(d˜1), where d˜1 is the coefficient on X1 in the regression of X2 on X1, Bhat2 is the estimate of B2 from the same model, and Bhat1 is the estimate of B1 in the model containing X2.
When the estimated slope coefficient in the simple regression model, Bhat1 is zero, then Rˆ2 =
0, because there's no dependence, and RSS = TSS. and Rˆ2 = ESS/TSS.
The sum and the average of the OLS residuals is ____ , thus the sample average of the OLS residuals is ___.
What assumption is violated by omitted variable bias?
The conditional means assumption E(ui given X) = 0.
The OLS residual for the ith observation is...
the difference between Yi and its OLS predicted value, that is, the OLS residual is uhati = Yi- Yhati.
T or F? Adding irrelevant regressors will not bias the relevant coefficients? Will not increase the standard errors?
E(ui given Xi) = 0 says that (technical language)
the conditional distribution of the error (the predicted probability) given the explanatory variable has a zero mean.
Multiplying the dependent variable by 100 and the explanatory variable by 100,000 leaves the ___ the same.
In the presence of heteroskedasticity and assuming the usual least squares assumptions hold, the OLS estimator is... efficient; blue; unbiased and consistent; unbiased and not consistent.
unbiased and consistent.
T or F. When you have an omitted variable problem, the assumption that E(ui given X)=0 is violated. This implies that the sum of the residuals is no longer zero.
Which of the following can cause OLS estimators to be biased: heteroskedasticity, omitting an important variable, the sample correlation coefficinet is .95 between two independent variables included in the model?
Only omitted variable. (The third would cause multicollinearity which would cause imprecise (high SE) estimates of B but not biased ones).
Why can't the coefficients of the regression below be calculated? y = B0 + B1male + B2female
Male perfectly predicts female. Perfect collinearity means no variaation remains in one of the Xs after the other is held constant.
Explain why a experimental method eliminates omitted variable bias.
The X won't be correlated by the error term, since not self-selected by the user, and hense correlation between Y and X can be interpreted as causal (unbiased).
What is the formula for calculating Rˆ2?
Rˆ2 = ESS/TSS = 1-(RSS/TSS) = "Model"SS divided by "Total" SS =
T or F: The slope estimator, B1, has a smaller standard error, other things equal, if there is more variation in X.
Does is make sense to assume homoskedasticity in a regression of wages on education? What error might you commit if you assume it?
No, variance in wages is likely higher at high education levels because increasing education increases labor market options, which will include high and low wage opportunities. You may over reject the null (variance and wages are uncorrelated) in hypothesis testing.
What is the "adjusted" Rˆ2 mathematically? How does it work? (3 important factors)
1 - (n-1)SSR / (n-k-1)TSS. 1) (n-1)/(n-k-1) is always greater than 1, so adjustedRˆ2 is always less than Rˆ2. 2) Adding a regressor to Rbarˆ2 has two opposite effects. SSR falls, which increases Rbarˆ2, but n-1/(n-k-1) increases (k makes the denominator smaller). Third, Rbarˆ2 can be negative if the lower SSR doesn't offset the higher k.
What is the Rˆ2 unrestricted and restricted?
Rˆ2 in the model where we force the null to be true (we run the regression without the Xs we specified in the null). In the unrestricted, the alternative hypothesis is allowed to be true.
What is q? What is k? Which are there more of?
q is the number of restrictions in the null hypothesis of our f-test. K is the number of X slope coefficients in the unrestricted situation
Explain how the heteroskedastic robust Fstat is like the Tstat
How much more of the model is explained on average by the one Xs in the null (bc we divide by the number of Xs) over how much of the model is not explained by any Xs multiplied by n-total number of xs- 1 (so its like the standard error.)
explain how the homoskedastic assumed fstat is like the tstat
How much less is unexplained (aka how much more is explained) when we unrestrict (add the canceled out Xs)
What is the SER (Standard Error of the Regression) for a regression with multiple regressors?
Var(Bhatj) = varˆ2(aka uhati) / TSS(1 -Rˆ2j)
Rˆ2j refers to the Rˆ2 from the regression of Xj on all other regressors. If Xˆj explains most of the variation in Xj (ie the other Xs arent correlated), the denominator will be close to 0 (1-a very close to 1 number). This implies that the variace of Bj is higher in multiple regression than single regression, unless Xj is totally uncorrelated with the other Xs. This is why you don't want to add a bunch of Xs which are unuseful. You lose "efficiency" in your estimates (the variance is looser.)