STAT EXAM #2 Session 4 HW 5
What is the technique for finding the best relationship between a quantitative output variable and one or more (usually quantitative) input variables?
Regression analysis
What is linear regression based on?
A scatterplot of data
How do we find the best fit line?
With the explanatory variable (x) and the response variable (y)
Linear regression line
y(hat)= a+bx
- x 'explains' the 'response' y
What does the least squares regression line do?
Minimizes the sum of the squared vertical distances from the points to the line
Vertical distance=
Residual
What is the equation for the difference between an actual and predicted output value?
Residual (equation)
What model describes a true linear relationship between the input and output of variables, allowing for random variability?
The regression model
The sum of the residuals always equals?
zero
The regression line always goes through?
The point (X bar, Y bar)
What measures the linear relationship between 2 quantitative variables and provides the strength (strong/weak) and direction (+,-)?
The correlation coefficient (r)
What is r(correlation coefficient) between?
-1 and 1
If r is greater than 0, it is due to what?
A positive slope
If r is less than 0, it is due to what?
A negative slope
If r is closer to 1 or -1, the linear relationship is what?
Stronger
If r is closer to 0, the linear relationship is what?
Weaker
If an input value is k standard deviations from its mean, then the predicted output value is?
rk standard deviations from its mean
Correlations based on __________ are stronger than correlations based on ___________
Averages, individual observations
Correlations based on ___________ of the input variable are weaker than correlations based on the ___________ of the input variable
Narrow ranges, entire range
What does r^2 stand for?
Coefficient of determination
What is r^2 between?
0% and 100%
What is the interpretation of r^2?
The % of variation in y explained by x
What is the interpretation of 1-r^2?
The % of variation in y NOT explained by x
What are the points that have values of y very different from the rest?
Outliers
What do outliers affect?
r and the regression line (change slope)
Outliers have ______ residuals?
Large
What is the variable which equals one if a particular condition is met and zero if the condition is not met?
Indicator variable
In regression, what can be represented by an indicator variable?
A binary (categorical) input variable
What are the 3 assumptions for regression inference?
1. Simple random sample
2. Normality: the response variable (y) is normally distributed at each value of x
3. Constant (equal) variance: the response variable (y), has a constant variance at each value of x
What are the 3 violations for regression inference where linear regression can't be used?
1. Non Linear
2. Non Normal (usually dealing with large samples)
3. Non constant variance: larger variance=larger prediction error
What are the 3 different ways to test the explanatory variable is a good predictor of the response variable?
1. ANOVA F test
2. t test for slope (hypothesis testing)
3. Confidence interval for slope
What are t-tests used for?
To determine if the linear relationship between x and y are significant
For t-tests you are testing either?
p (population correlation coefficient) or B1 (population slope)
When testing p (population correlation coefficient) what are the null and alternative hypotheses?
Ho: p = 0 - NO correlation between x and y
H1: p doesn't = 0 - There IS a relationship between
x and y
When testing B1 (population slope) what are the null and alternative hypotheses?
Ho: p = 0 - NO LINEAR correlation betwen x and y
H1: p doesn't = 0 - SIGNIFICANT LINEAR relation
between x and y
For t-tests when p is low?
- Reject Ho
- Conclude Ha (the alternative)
- X is a good predictor
For t-tests when p is high?
- Fail to reject
- Cannot conclude Ha
- X is not good predictor variable
For t-tests, the confidence interval tests for?
The significance of each predictor (x) individually
- If 0 is in interval, X is not a good predictor of Y
- If 0 isn't in interval, X is a good predictor of Y
