Week 10 & 11: Standard Multiple Regression
Terms in this set (103)
• Prediction analysis
• Multiple regression analyses are performed on correlational data, and correlation does not imply causation.
• While predictor variables may significantly predict an outcome variable, it does not imply that they are causing changes in the outcome variable
• Linear regression is a model to predict the value of one variable from another
• The model used is a linear one
• Therefore, we describe the relationship using the equation of a straight line.
• Multiple regression is a natural extension of this model:
o We use it to predict values of an outcome from several predictors
o It is a hypothetical model of the relationship between several variables.
general linear equation
outcome = model + error
linear equation for just one variable
method of least squares
• Red dots- actual scores
• Blue lines - difference between actual scores and predicted scores (residuals)
• Square residuals to get SSresidual
• Method of least squares uses calculus to determine the regression line, which minimizes SS residual.
which variables are significant predictors?
• The null hypothesis is that the regression coefficient is 0
• Sig. = p value
• When both Revision Time and Exam Anxiety are used together as predictors of Exam Performance:
o Revision Time: p value is less than .05 (i.e., p = .029), it is therefore a significant predictor of Exam Performance
o Exam Anxiety: p value is greater than .05 (i.e., p = .06), it is therefore not a significant predictor of Exam Performance
which variable is the best predictor?
• First way is to look at the standardized regression coefficient→ in z-score units
• Since the predictors may be measured on very different scales, etc., it is not fair to compare the unstandardized regression coefficients (B values) to see which predictor is having the most influence on the outcome variable
• The standardized coefficients (β values) are z-score versions of the B values. Since z-scores all have the same mean (of 0) and standard deviation (of 1), you can directly compare them...
• The standardized regression coefficients (β) tell you how much the outcome changes by for each increase of 1 SD in the predictor (i.e., how many SDs does the outcome change by)
o So if exam revision is increases by 1 SD then exam performance is expected to increase by 0.263 SD
o Unstandardized coefficients- if we increase exam revision by one hour then exam performance is expected to increase by 0.413%
• The β value with the largest magnitude (ignoring the + and - signs) is the best predictor
• Thus, Revision Time is the best predictor of Exam Performance and indeed, it is the only significant predictor.
assessing the goodness of fit
• We use our residuals to use if our model overall fits the data
risk of overfitting the data
• All methods of multiple regression choose regression coefficients which minimize residuals
• Regression models make your data look the best, and therefore capitalise on chance
• The proportion of variability explained, R2, is sensitive to sample size (N), and the number of predictors (k)
• As the number of predictors (k) approaches the sample size (N), R2 approaches 1
• When we talk about the difference between r squared and adjusted r squared that's what we call R2 Shrinkage
• Regression chooses the 'best fit'
• Prone to overfit the data
• Focuses upon the idiosyncrasies of the sample data
• Regression model may not necessarily work with a new data set
• Failure to replicate is called shrinkage
• Shrinkage is best evaluated using a cross validation study
squared semi-partial correlations
• Zero-order is the Pearson correlation
• Square the semi-partial correlations (sr2) to determine the proportion of variability in the outcome uniquely accounted for by that predictor - e.g.:
o 4% of the variability in exam performance is uniquely accounted for by revision time (i.e., .1992 x 100)
o 3% of the variability in exam performance is uniquely accounted for by exam anxiety (i.e., -.1712 x 100)
testing the significance of the regression equation
• Just as pearson correlations are tested for significance, regression equations should also be tested for significance to show whether predictions are significantly better than chance.
• The overall significance of the regression equation can be evaluated by computing an F-ratio.
• A significant F-ratio indicates that the equation predicts a significant proportion of the variability in the y scores- more than would be expected by chance alone.
when we run regression, hope to be able to generalise to the whole population- look for outliers
• To do this, several assumptions must be met
• Violating these assumptions can affect how well the regression model fits the data, and how well the regression model can be generalised
• Assumption violations call into question the validity of the regression model and the generalizability of conclusions to the target population
predictors must not be highly correlated
• Major assumption for linear regression
• Multiple regression assumes no perfect multicollinearity
o No perfect linear relationship between two or more predictors
o Predictors (IV) should not correlate too highly with each other
• We use tolerance and VIF (vairance inflation factor) to examine the issues of multicollinearity
• They are just the reciprocal of one another = tolerance = 1/ VIF
• Multicollinearity is indicated by tolerance values of less than 0.1 or VIF values of greater than 10.
• To deal with this problem we would need to eliminate either age or years as a predictor and re-run the regression analysis
if significant outlier but cooks distance < 1 = no real need to delete (doesn't have large effe ton regression analysis). but study further to understand why they don't fit the model.
options: remove variables, remove outliers, transform variables
evaluates overall influence of a single case on regression model as a whole.
analyse > regression ? linear
dependent = outcome
model = regression
• B0 is the y-intercept
• The intercept is the value of the Y variable when all Xs = 0
• This is the point at which the regression plane crosses the y-axis (vertical)
• This value is also referred to as the constant in SPSS because it does not change across participants.
• Regression coefficients
• B1 is the regression coefficient for variable 1→ tells you that if variable 1 increases by 1 unit, how much change is predicted in the outcome variable
• B2 is the regression coefficient for variable 2→ tells you that if variable 1 increases by 1 unit, how much change is predicted in the outcome variable
• Bn is the regression coefficient for the nth regressor
• These are referred to as unstandardized regression coefficients- they are in the original units of the variable.
• Unstandardized regression coefficient (B)
• Positive relationship between revision time and exam performance
• Negative relationship between exam anxiety and exam performance
• R or multiple R gives the correlation between actual scores and the predicted scores
• The higher this correlation, the closer the dots are to the plane of best fit- the better the regression model predicts the actual data.
• The closer it is to one the better the line fits over the dots
• Very rare to get a negative r value for regression
• R square tells you the proportion of variability in the outcome variable which is accounted for by the predictor variables → multiple by 100 to get a percentage
• Thus, 19.8% of the variability in Exam Performance is accounted for by Revision Time and Exam Anxiety, together
• Adjusted R square gives you an estimate of R2 in the population
• It takes into account the fact that the regression model might be over fitted to your particular data set - in other words, the regression model may not work as well with other sample data.
• It gives us an estimate of how much variability would be explained if the model was derived from the population rather than a sample.
• Accordingly, it takes into account sample size, the number of predictors, etc.
• The more IVs you put in the more this becomes a problem- over fitted our regression model to our data so much so that it doesn't fit with another sample from the same population.
• Smaller sample size the smaller adjusted r squared will be→ the more predictors in the model the smaller it will be as well
• Sample correlations vary around the population correlation- therefore, it can be expected to vary
• Sampling error (i.e., the discrepancy between sample and population) will increase as the sample size decreases (i.e., because the sample won't be as representative of the population), and as the number of predictors increases (i.e., because there is error associated with each predictor)
• Thus the obtained, r squared is likely to be an overestimate
• Adjusted r squared estimates what r squared is likely to be had it been derived from the population instead of the sample
comparing r^2 and adjusted r^2
• Note that if there is a large discrepancy between R2 and adjusted R2 (i.e., shrinkage), then this indicates that the regression model does not generalize well to the population.
• We what the difference to be relatively small→ if we get a really big difference it suggests that we have over fitted the data
testing the model- ANOVA
• The F Ratio Sums of Squares are total values.
• They can be expressed as averages (i.e., by dividing by These averages are called Mean Squares, MS their corresponding degrees of freedom).
• The F-ratio compares the variance predicted by the model (MSR)
F = MS m / MS r
types of multiple regression
1. Standard Multiple Regression → Otherwise known as simultaneous or forced entry multiple regression. All predictors are entered simultaneously
2. Hierarchical Multiple Regression → The researcher decides the order in which the predictors are entered into the model
3. Stepwise Multiple Regression→ Predictors are selected (by the computer) on the basis of their semi-partial correlation with the outcome variable
• Predictors are entered in a specific order
• Theoretically important or known predictors (based on past research) are entered first OR Extraneous variables can be controlled for by entering them first
• New predictors are then entered in a separate step/block
• i.e. to assess whether new predictors significantly improve the prediction of the outcome
model testing- HMR
• Allows model testing:
o Model 1 just includes the known predictors/extraneous variables, whereas model 2 includes these variables plus the new predictors.
o Does model 2 represent a significant improvement over model 1 in predicting the outcome variable?
• You can see the unique predictive influence of a new predictor on the outcome because the known predictors/extraneous variables are held constant.
• Important to assess the change in r squared.
• How much extra variability is accounted for by addition of the new predictors?
forward vs backward selection
• One predictor variable to predict another outcome variable
• Exactly the same as a correlation- when we are doing correlations we are really doing a simple regression
• Can we predict exam performance based on revision time?
• What we do with the linear regression is find some kind of equation that describes the line of best fit→ the best relationship we can find between revision time and exam performance
two or more regressions
line of best fit becomes a plane of best fit
multiple regression equation
why do we want to figure out the equation?
• In the social sciences, more often than not, we don't actually want to use the equation in for making predictors.
• Instead we are more interested in→ direction, which variables are significant predictors and which variable is the best predictor.
• E.g. predict their survival time
evaluating r shrinkage
• No guidelines are provided in the literature about how much shrinkage is too much
• Most seem to suggest that more than a few percent is unacceptable
• However, others argue that you should evaluate shrinkage as
• A proportion of the original R2 (i.e. that 5% shrinkage is acceptable if R2 = .50, but perhaps not if R2 = .20) R2
• Thatcher and Henson provide a summary of good summary of the issues associated with shrinkage / adjusted
• They do not provide guidelines on an acceptable level of shrinkage, but do provide a couple of examples which suggest that 2-3% is acceptable (where shrinkage = R2 - adjusted R2)
ANOVA: F ratio (regression)
• If the regression model results in better prediction than using the mean, then we expect SS model to be much greater than SS residual
uses the differences between the observed data and the mean value of Y
uses the differences between the observed data and the regression line
testing the model r^2
• The portion of variance accounted for by the regression model
• Ratio of variance accounted for by the regression model to total variance
• 1 = our regression model completely explains the variance in our data
• 0 = explains nothing at all
you can use dichotomous variables- variables consisting of only two categories- as predictors in MR analyses, provided that you dummy code them as 0 and 1
how do we test for the significance of the regression equation?
we compute the F ratio ANOVA
what do significant F-ratio indicate?
that the equation predicts a significant proportion of the variability in the Y scores (more than would be expected by chance alone)
if regression model results in better prediction than using the _______; then we expect SSmodel to be much _____than SSresidual
good model =
large F ratio
violations of assumptions
violating assumptions can affect who well the regression model fits that data nd how well the regression model can be generalised
Tolerance and VIF- variance inflation factor
amount of variability n predictor which is not explained by the other predictors- does predictor have strong linear relationship with other predictors?
Bigger VIF = more multicollinearity
when is multicollinearity indicated?
tolerance <0.1 or VIF >10
how to deal with violation of multicollinearity
need to eliminate either age or years as predictor and re run regression analysis
adding more variables = adds______= reduces_______
what are the assumptions- linear regression
normally distributed errors
homoscedasticity of residuals
for any pair of observations the error terms should be not correlated with one another.
my residual should not effect your residual.
transformations may help
durbin-watson tests examines independence of errors
values < 2 = positive correlation and values >2 = negative correlation
closer value is to 2 the better (means uncorrelated)
values <1 or >3 are cause for concern- some correlation
normally distributed errors
residuals or regression model should be random and normally distributed with mean = 0 (most residuals clustered around the line of best fit)
predictors dont have to be normally distributed
dont want skewing- means model over or under predicting
homoscedasticity of residuals
for each value of the predictors the variance of the error term should be constant
XRESID X SPRED
at each levels of predictors, residuals soul have the same variance
equal variances for the residuals
heteroscedasticity= unequal variances for the residuals
assumes modelling linear relationships
if significant must be linear- not clear u or s shape
can look at partial regression plot to look of non-linearity.
dealing with non-linearity- try transforming data
• Multiple regression assumes that you are modelling a linear relationship.
• We can use the partial regression plots to look for non-linearity → works in the same way as a partial correlation it removes the effects of one variable.
• We can also look at matrix scatterplot
cook's distance <1 and Mahalanobis distance <15
no multivariate outliers or overly influential cases
outliers- how do we look for outliers?
• An outliers is a case that differs substantially from the main trend of the data
• Outliers can bias the regression model
• If a case is an outlier, then the regression model should not predict them well, and they should have a large residual.
• To find this we perform casewise diagnostics
we also use cooks distance and malanobis distance
• Univariate outliers are values which are very different from other values for one particular variable.
• When multiple variables are being considered at the same time 9as in MR), some variable combinations are unusal
o E.g. scoring high on depression and anxiety, but also scoring high on self-esteem.
o Such unusual combinations are referred to as multivariate outliers.
are cases which exert undue influence on the regression model
sample size requirements
• As a rough estimate, field say that you need 10-15 cases of data per predictor variable.
• Two other rules of thumb given by Green (1991, cited in Field, 2009, p. 222):
o For testing the overall regression model (i.e., together, do the predictors significantly predict the outcome variable): N ≥ 50 + 8k
o For testing individual predictors (i.e., does EACH predictor predictors significantly predict the outcome variable): N ≥ 104 + k
• Where k = number of predictor variables
o Note that if you want to test the significance of the overall regression model AND individual predictors, calculate both of the above sample sizes, and use the larger value as your required sample size
greater correlation coefficient the ______amount of error that is associated with the data
what do the values in this equation represent:
y = a + b x
y = outcome
a = y intercept
b = slope (rise/run)
x = predictor
what does linear regression do?
determines the equation of a straight line that best fits the specific set of data
how can we check for independence of errors in SPSS
Independence of errors (Durbin-Watson)
how can we check for outliers in SPSS
Outliers (Casewise Diagnostics, Cook's & Mahalanobis Distances)
how can we check for multicollinearity in SPSS
Multicollinearity (Tolerance & VIF)
how can we check for normality of residuals in SPSS
Normality of residuals (Histogram & Normal P-P Plot)
how can we check for linearity of residuals in SPSS
Linearity of residuals (Scatterplots)
how can we check for homoscedasticity of residuals in SPSS
Homoscedasticity of residuals (Scatterplots)
R is the correlation between the actual dependent variable values and the predicted values.
the higher this correlation is the closer the fit between the predicted values and the actual values.
Is the amount of variability in the DV that can be accounted for by the regression model.
adjusted R squared
gives an estimate of how much variability would be explained if the model was derived from the population rather than the sample
if there is a large discrepancy between r square and adjusted r square the model may not generalise well to the population
std. error of the estimate
estimates tells you the average distance between the actual and predicted y values
closer R is to 0 the further away the data points will be from the line - the STD error will be large
closer R is to 1 the closer the data points will be from the line - the STD error will be small
an ANOVA exams the ______________ of the regression model
which table gives you information about whether equation predictions are significantly better than chance?
what table provides information about your regression equation and whether your independent variable is a significant predictor of your DV?
y intercept- given by the unstandardised B value for constant.
this value is the z-score version of the B
gives us an estimate for b in the population
can we get the x intercept from the coefficients table?
no we must calculate it ourselves
x intercept = -a/b = -(y intercept)/ slop
independence of errors
• For any two observations the errors or residuals terms should be independent with one another
• Hence observations should independent and there should not be any systematic relationship visible among the residuals
• If the assumption is violated, data transformations may address the issue.
• We test this through using the Durbin-Watson value.
• Closer to 2 the better the assumption is met
• Less than 2 = positive correlation
• Greater than 2 = negative correlation
• Less than 1 or greater than 3 is cause for concern.
examines how far each case is from he means of the predictors
we need to look up the critical value
if the value is below this critical value- there is no multivariate outliers
• Measure of multivariate outliers
• Moderate sample size of 100, values greater than 15 are considered problematic
• Values of 11 or greater are problematic for small samples N=30
• 25 or greater for large samples (N=500)
we can also look up the critical value by doing the following:
o Mahalanobis distance is distributed as chi-square with degrees of freedom equal to the number of predictors (K)
o Therefore, compare Mahalanobis distance against the critical chi-square value for df= k
o However, because this a sensitive test, use a conservative alpha level of a = 0.001
- if your maximum MD is greater than the critical value, it suggests the presence of one or more multivariate outliers
examines the overall influence of a single case on the regression model
> 1 indicate that the case is having an overly influential effect on the regression model
Tolerance and VIC examine the issue of multicollinearity
they are just the reciprocal of one another
tolerance = 1 /VIF
is the amount of variability in the predictor that is not explained by other predictors.
0.719 tolerance would mean that 71.9% of the variance in one IV does not overlap with the remaining predictors
normality of residuals
through visual inspection of the histogram we can see whether the normality of errors is met
we can also look at p=p plots.
the residuals are normally distributed if the dots fall on the diagonal line.
linearity and homoscedasticity
this assumption is met when the graph looks like a random array of dots, roughly rectangularly distributed, with most of the scores concentrated in the centre
R square change - hierarchical regression
looking at the values we can see how much more of the variance can model 2 explain than compared to model 1
normally distributed errors
• The residuals for the regression model should be random, and normally distributed with a mean of zero
• This does not mean that the predictors have to be normally distributed- predictors do not need to be normally distributed- although it does improve the chances of the assumption being met
• If we run a line through the centre we want to be able to get a bell curve
what is the consequence of multicollinearity?
• Main problem - two things that are highly correlated are redundant
o If age and years correlated so high why we putting them in our model?
• Increases our error term
• Makes r2 smaller than it can be
• It makes the individual coefficients less trustworthy
• Tolerance= the amount of variability in a predictor which is NOT explained by the other predictors.
• If we had a tolerance value of 0.79 it means that only 7.9% of the variance in age does not overlap with the other predictors. Thus most of the variability in age is accounted for by the other predictors- highly correlated
dealing with non-linearity
• We can try transforming our data- applying a log, square root, or squared transformation
• Once you transform the data it makes it harder to interpret- you will then be showing that there is/isn't a relationship between the outcome variable and the transformed predictor.
• You can transform the data back.
partial eta squared
Partial η² = SSfactor/(SSfactor+SSerror)
most common effect size for factorial ANOVA
a version of eta squared that is the proportion of variance that a variable explains when excluding other variables in the analysis.
the proportion of variance that a variable explains that is not explained by other variables
What is the first preference as method of detection of outliers?
what is the consequence of breaking linearity?
weakened the analysis by reducing its power because the full extent of the curvillinear relationship among the IVs and DV cannot be mapped
what are the parameter estimates in MR
unstandardised regression coefficients (b weights)
Regression analysis can only use _________variables
continuous IV variables dichotomous
Regression analyses are best used when:
IVs are strongly correlated with the DV and not with the other IVs
If the DV is skewed then the requirement of subjects _______
In MR, a residual is the difference between predicted and obtained:
The squared multiple correlation (R2) is the proportion of variation:
in Y' predictable by the best linear combination of IVs Correct
In standard MR, overlapping variance between the IVs:
contribute to R2, not to any individual IV Correct
In sequential (hierarchical) MR, overlapping variance between the IVs:
is taken up by early entering IVs Correct
A risk in interpreting IVs in a sequential (hierarchical) MR is that correlated IVs:
may seem unimportant if they enter the equation late Correct
The major Ho of any multiple regression is:
Multiple regression = 0