Upgrade to remove ads
AP Stats Unit 9: Linear Regression Inference Vocab + Variables
Terms in this set (36)
To review: we use least-squares regression to study the relation between a couple of variables, both of which are (quantitative, categorical).
Before doing regressions to study the relationship between two quantitative variables, we should explore the data by examining a _______ and a __________.
(2) Residual plot
The statistic that describes the strength of a linear relationship, that is the same whichever variable is thought of as the explanatory variable, and which has a familiar relationship to the percent of variance in one variable explained by the other, is the ______ ______.
Correlation coefficient (or just, the correlation)
What is a residual?
A residual is the vertical distance between the data point and the regression line, or y - ŷ.
The r-squared (r²) value, which is part of the regression output, tells us how much of what is what?
How much of the variation in the y variable is accounted for by the linear relationship with x.
Suppose we draw lots of samples and compute a regression line for each sample. The slope and intercept of each sample line estimates a true value. Thus the slope and intercept we obtain from our sample are _____ that estimate population ______.
One of the conditions for regression inference is that for any fixed value of x, the response variable y varies according to a _____ distribution.
Another assumption for regression inference is that for any fixed value of x, the repeated responses y are ____ of each other.
Another assumption for regression inference is that the means of the sets of y-values for each x value have what relationship to the x values?
That the means of the y's for each x are a linear function of x: mean for y's = alpha + beta * x
(µy = α + β(x))
Another assumption for regression inference is that what measure of dispersion is equal for each value of x?
The standard deviation of the y's for the various x values.
True or False: the slope and intercept we obtain from the least squares regression for our sample are unbiased estimators, respectively, of the line connecting the population means for each of the x's.
What is the unbiased estimator for the standard deviation of the y values around the regression line (in other words, the standard deviation of the y values around the means of each of those values for each x)?
The statistic called s, which is the standard error, or the standard deviation of the residuals.
The statistic s represents the estimate of the standard deviation ____ in the regression model.
The parameter we are usually most interested in estimating from regression output is the (slope, y- intercept) of the line.
What is the general form for a confidence interval for regression slope?
b ± t*SEb
(the second b for be subscript)
The most commonly tested hypothesis about regressions is that Beta, the "Population slope," is 0. Can you put this hypothesis in some other phrasings?
Ho: β = 0
(1) That the straight line dependence on x has no value in predicting y.
(2) That the population correlation between x and y is 0.
(3) That there is no true linear relationship between x and y.
If you form the ratio of the slope obtained in your sample to the standard error of that slope, what is the sampling distribution of that statistic?
It's distributed according to the t distribution, with n-2 degrees of freedom.
Regression output usually gives a two-sided p value for the hypothesis test that the population slope is 0. How do you obtain a one-sided p-value for the same hypothesis?
Divide the two-sided p-value by two.
Suppose that in a residual plot, the values are close to 0 when x is low, but the residuals get bigger and bigger in absolute value as the x values get greater. What condition of regression is violated in this circumstance?
The condition that the standard deviation of the response around the true line is the same everywhere.
Someone examines a residual plot and a scatterplot and observes a curvilinear pattern. What condition of
regression is being violated, and what should the researcher consider doing in order to correct this?
The condition violated is that the true relationship is linear. The researcher should consider transforming one or more of the variables.
What is the equation for a t statistic?
t = b - β / SEb
This is used on the calculations for a test of significance.
What does SEb equal? (the equation)
SEb = s / √∑(x - x̄)²
S is known as the __________ and it is the sample standard deviation of the residuals. What is the equation for s?
(1) regression standard error
(2) s = √((∑Residuals²)/(n-2))
Residuals = y - (y-hat)
Predicted y value when x = 0.
Predicted change in y if x increases by 1.
Predicted value of the response variable.
Indicates strength and direction of a linear relationship.
Percent of variation in the y-variable that can be explained by the linear relationship with x.
Population average change in y for a 1 unit increase in x.
Test statistic for a significance test for β.
Standard error for the sampling distribution of b. Use to write confidence intervals and find t.
Typical error size when using the regression line to make predictions.
n - 2
Number of degrees of freedom for a t distribution for β.
Probability of a slope as extreme as observed or more extreme if the population slope equals 0.
y - ŷ
The residual of a predicted value of y.
Sets found in the same folder
AP Statistics Chapter 7 - Sampling Distr…
AP Statistic Chapter 2 - Two Variable Data
Unit 6: Inferences for Categorical Data:…
AP Statistics - Inference for Quantitati…
Other sets by this creator
Vocab List #2+3, Unit 6
Vocab List #2, Unit 6
Vocab List #3, Unit 6
Linear Regression Inference Vocab