17 terms

Response variable

A response variable measures an outcome of a study

Explanatory variable

An explanatory variable may help explain or influence changes in a response variable

Scatterplot

A scatterplot shows the relationship between two quantitative variables measured on the same individuals

The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis

Each individual in the data appears as a point in the graph

The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis

Each individual in the data appears as a point in the graph

Positive association

Two variables have a positive association when above-average values of one tend to accompany above-average values of the other, and when below-average values also tend to occur together

Negative association

Two variables have a negative association when above-average values of one tend to accompany below-average values of the other

Correlation r

The correlation r measures the direction and strength of the linear relationship between two quantitative variables

Regression line

A regression line is a line that describes how a response variable y changes as an explanatory variable x changes

We often use a regression line to predict the values of y for a given value of x

We often use a regression line to predict the values of y for a given value of x

Regression line, predicted value, slope, y intercept

Suppose that y is a response variable and x is an explanatory variable

A regression line relating y to x has an equation of the form

ŷ = a + bx

In this equation,

ŷ is the predicted value of the response variable y for a given value of the explanatory variable x

b is the slope, the amount by which y is predicted to change when x increases by one unit

a is the y intercept, the predicted value of y when x = 0

A regression line relating y to x has an equation of the form

ŷ = a + bx

In this equation,

ŷ is the predicted value of the response variable y for a given value of the explanatory variable x

b is the slope, the amount by which y is predicted to change when x increases by one unit

a is the y intercept, the predicted value of y when x = 0

Extrapolation

Extrapolation is the use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line

Such predictions are often not accurate

Such predictions are often not accurate

Residual

A residual is the difference between an observed value of the response variable and the value predicted by the regression line

That is,

residual = observed y - predicted y

residual = y - ŷ

That is,

residual = observed y - predicted y

residual = y - ŷ

Least-squares regression line

The least-squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible

Equation of the least-squares regression line

We have data on an explanatory variable x and a response variable y for n individuals

From the data, calculate the means x̄ and y-bar and the standard deviations Sx and Sy of the two variables and their correlation r

The least-squares regression line is the line ŷ = a + bx with slope

b = r(Sy / Sx)

and y-intercept

a = y-bar - bx̄

From the data, calculate the means x̄ and y-bar and the standard deviations Sx and Sy of the two variables and their correlation r

The least-squares regression line is the line ŷ = a + bx with slope

b = r(Sy / Sx)

and y-intercept

a = y-bar - bx̄

Residual plot

A residual plot is scatterplot of the residuals against the explanatory variable

Residual plots help us assess how well a regression line fits the data

Residual plots help us assess how well a regression line fits the data

Standard deviation of the residuals (s)

If we use the least-squares line to predict the values of a response variable y from an explanatory variable x, the standard deviation of the residuals (s) is given by

s = square root(Σresiduals^2 / (n - 2)) = square root(Σ(Yi - ŷ)^2 / (n - 2))

This value gives the approximate size of a "typical" or "average" prediction error (residual)

s = square root(Σresiduals^2 / (n - 2)) = square root(Σ(Yi - ŷ)^2 / (n - 2))

This value gives the approximate size of a "typical" or "average" prediction error (residual)

The coefficient of determination: r^2 in regression

The coefficient of determination r^2 is the fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x

We can calculate r^2 using the following formula:

r^2 = 1 - SSE / SST

where SSE = Σresidual^2 and SST = Σ(Yi - y-bar)^2

We can calculate r^2 using the following formula:

r^2 = 1 - SSE / SST

where SSE = Σresidual^2 and SST = Σ(Yi - y-bar)^2

Outliers in regression

An outlier is an observation that lies outside the overall pattern of the other observations

Points that are outliers in the y direction but not in the x direction of a scatterplot have large residuals

Other outliers may not have large residuals

Points that are outliers in the y direction but not in the x direction of a scatterplot have large residuals

Other outliers may not have large residuals

Influential observations in regression

An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation

Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line

Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line