29 terms

Regression Analysis

STUDY
PLAY
Predictive Analysis
allows one to make forecasts for future events
Linear Relationships and Regression Analysis
PREDICTION: statement of what is believed will happen in the future made on the basis of past experience or prior observation

PREDICTIVE MODEL: uses relationships among variables to make a prediction
Regression analysis is a predictive analysis technique in which one or more variables are used to predict the level of another by use of the straight-line formula, y=a+bx
-BIVARIATE REGRESSION ANALYSIS is a type of regression in which only two variables are used in the regression, predictive model
-ONE VARIABLE is termed the DEPENDENT VARIABLE (Y), the other is termed the INDEPENDENT VARIABLE (X)
-the INDEPENDENT VARIABLE is used to predict the DEPENDENT VARIABLE, and it is the X in the REGRESSION FORMULA
Linear Relationships and Regression Analysis
-Regression analysis is a predictive analysis technique in which one or more variables are used to predict the level of another by use of the STRAIGHT-LINE FORMULA

y=a+bx, Sales($)=$10,000+1.3*Advertisement ($)
1. for each dollar amount spent in advertisement, sales increase by $1.3
2. If advertisement=0, sales=$10,000
Understanding Prediction 1
-Predictive models based on regression analysis: use ASSOCIATIONS among variables to make a prediction
-> when variable X=to X1, then Y=Y

-Relies on observed, past relationships between what you want to predict and some other variables
-> ex: predict sales from time of the year, competition, advertising, economy...
Goodness of Prediction
-Predictive models should be judged as to their "goodness" (accuracy)
-The goodness of a prediction is based on examination of the difference between the PREDICTED VALUES and the EXPECTED VALUE
->This is called residual(s): Comparison of predictions to actual values. Greater residuals->more error->lower accuracy of prediction
R2 (R square)
-On the basis of the value of the residuals, it can be determined what percentage of variation in variable "Y" is explained by the "Xs". When the value of residuals is high, the % of variability in "Y" explained is low
-The % of "Y" that is explained, or accounted for, by all the "Xs", is indicated by the R2 WHICH RANGES FROM 0 to +1
*WE WANT SMALLER LEVELS OF RESIDUALS BECAUSE IT GIVES HIGHER LEVELS OF GOODNESS OF PREDICTION*
R2=1=r2
Understanding Prediction 2
-Regression models does NOT GIVE EVIDENCE FOR CASUALITY (same logic as association analysis)
-Remember, if you find an ASSOCIATION between two variables, USING A DESCRIPTIVE RESEARCH DESIGN, let's say sales and $ spent in advertising, you don't have evidence for evidence for causality

-With a regression model you can find if VARIABLES "Xs" EXPLAIN (i.e. tells you what to happen to) VARIABLE "Y". That is, knowing the value of Xs, you can make an estimate the value of "Y" on the basis of a past associations between "Xs" and "Y"
Example
-You measure overall satisfaction and satisfaction with food
-You run a regression: Satisfaction=bo+b1*Satisfaction with food
-If b1 is sig. and positive->satisfaction with food explains overall satisfaction/People are satisfied with their food tend to be satisfied with their drinks
Linear Relationships and Regression Analysis
-Regression analysis is a predictive analysis technique in which one or more variables are used to predict the level of another by use of the STRAIGHT-LINE FORMULA
-BIVARIATE (SIMPLE) REGRESSION (one predictor)
-MULTIPLE REGRESSION (two or more predictors)
Regression Analysis
-BIVARIATE REGRESSION ANALYSIS is a type of regression in which only TWO VARIABLES ARE USED IN THE REGRESSION, predictive model
-->One variable is termed the DEPENDENT VARIABLE (y), the other is termed the INDEPENDENT VARIABLE (x)
y=a+bx
where...
Y=A+BX

y=the predicted variable
x= the variable used to predict y
a= the INTERCEPT, or point where the line cuts the y-axis when x=o
b=the SLOPE or the change in y for any 1-unit change in x

Note: Y must be a METRIC VARIABLE. The X(s) must be a METRIC VARIABLE or DICHOTOMOUS VARIABLE is one that takes on one of only two possible values variable (proportion-type variable)
Bivariate Linear Regression Analysis: Basic Procedure
-The regression model, intercept (a), and slope (b) must always BE TESTED FOR STATISTICAL SIGNIFICANCE, since we are estimating them with a sample. We are interested in the values of B and A in the population
-Regression analysis predictions are estimates that have some amount of error to them
Testing for Statistical Significance of the Intercept and the Slope
-THE T-TEST/P-VALUE: are used to determine whether the intercept (a) and slope (a) are significantly different from zero. That is, if there is enough evidence to suggest that they ARE DIFFERENT FROM ZERO IN THE POPULATION

-If the computed t value is greater than 1.96 or the p-value <.05, than the parameter is different from zero.

y=a-bx

Note that simple regression is somewhat similar to correlation coefficient but give you the same slope coefficient (b)
Regression predictions are made with confidence intervals
Regression analysis with SPSS
-Does "being fashionable" explain whether a catalog matches the Gucci image

Y=a+b*X
Y=Matches the Gucci image
X= Fashionable

Run a regression equation with "matches the Gucci image" as the dependent variable, and "fashionable" as the independent variable
ANOVA Table
The results you will find on the ANOVA table (see chart):
-The p-value from the ANOVA table tells whether the model has statistically significant predictive capability
-That is, if, overall, the model can predict the dependent variable
-If the p-value <.05-that model DOES predict the dependent variable (there is a linear relationship)
-If the p-value>.05-the model DOES NOT predict the dependent variable (there is not a linear relationship). YOU CAN STOP HERE.
Multiple R (R2)
-R2 RANGES FROM 0 TO +1 and represents the AMOUNT OF THE DEPENDENT VARIABLE THAT IS "EXPLAINED" or accounted for, by the combined independent variable
-It is a measure of the STRENGTH OF THE LINEAR RELATIONSHIP between the independent and dependent variable. It is an indication of how well the independent variables can predict the dependent variable in multiple regression
->Convert the R2 into a percentage: R2 of .27 means that the regression model explains 27% of the variability in dependent variable
Bivariate Linear Regression Analysis: Basic Procedure
-Least squares criterion: used in regression analysis; guarantees that the "best" straight-line slope and intercept will be calculated
Multiple Regression Analysis
-Multiple regression analysis uses the same concepts as bivariate regression analysis, but uses MORE THAN ONE INDEPENDENT VARIABLE
-General conceptual model identifies independent and dependent variables and shows their basic relationships to one another
Multiple Regression Analysis
-Multiple regression means that you have more than one independent variable to predict/explain a single dependent variable

Multiple Regression Equation: y=a+b1X1+b2X2+b3X3....BmXm

where: y=the dependent, or predicted, variable
xi=independent variable i
a=the intercept
bi=the slope for independent variable i
m= the number of independent variables in the equation

Note: Y must be a METRIC VARIABLE. The X(s) must be a METRIC VARIABLE or a DICHOTOMOUS VARIABLE is one that takes on one of only two possible values variable (proportion-type variable)
Example of Multiple Regression
Multiple Regression is a powerful tool because it tells us which factors predict the dependent variable, WHICH WAY (the sign), each factor influences the dependent variable, and even HOW MUCH (the size of b) each factor influences it
Adjusted R2
-R2 RANGES FROM 0 to +1 and represents the AMOUNT OF THE DEPENDENT VARIABLE THAT IS "EXPLAINED", or accounted for, by the combined independent variables
-ADJUSTED R2 is a modification of R that ADJUSTS FOR THE NUMBER OF EXPLANATORY terms in a model
-Useful when you are comparing models with different number of predictors
-->In general, the more predictors you have, the higher is the R2-for this reason, it is not "fair" to compare models with different # of predictors. This is why we introduce an adjustment.
Standardized beta coefficient
-STANDARDIZED BETA COEFFICIENT: betas that indicate the relative importance of alternative predictor variables. They are used to compare different B's and see which has the greater impact on Y
-The basic idea is that different variables are measured with different units of measurement (e.g. unit of attitude, unit of preference...) For this reason, the (unstandardized) betas are not comparable. Standardized beta coefficient use comparable units of measurement so that betas become compared
Variance Inflation Factor
Basic assumptions:
-Independence assumption: the independent variables must be statistically independent and uncorrelated with one another
-Variance inflation factor (VIF) can be used to assess and eliminate multicollinearity
-CUTOFF VALUE: 10
Multiple Regression with SPSS
What of the following variables predict whether a catalog matches Gucci image

Y=a+bi*X+b2X2+b3X3+b4X4+b5X5
Y-Matches the Gucci image
X1-attention grabbing
X2-fun to look at
X3-show quality values
X4-fashionable
X5-easy to place an order
Trimming
To get more accurate estimates, eliminate from the model variables that are not significant (i.e. the p-value for their bs is >.05) This particularly important when you want to use the results to make predictions. To trim the model:
1. Eliminate one variable at the time, starting from the one that has the higher p-value
2. Re-run the analysis
3. Repeat until only significant variable are left in the model
Stepwise Multiple Regression
The one independent variable that is statistically significant and explains the most variance is entered into the multiple regression equation
Stepwise Multiple Regression
-Then each statistically significant independent variable is added in order of variance explained
-All insignificant independent variables are eliminated
3 Warnings Regarding Multiple Regression Analysis
1-Regression is a statistical tool, not a cause-and-effect statement
2-Regression analysis should not be applied outside the boundaries of data used to develop the regression model
YOU MIGHT ALSO LIKE...