### Explanatory variable

An explanatory variable may help explain or influence changes in a response variable

### Scatterplot

A scatterplot shows the relationship between two quantitative variables measured on the same individuals

The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis

Each individual in the data appears as a point in the graph

### Positive association

Two variables have a positive association when above-average values of one tend to accompany above-average values of the other, and when below-average values also tend to occur together

### Negative association

Two variables have a negative association when above-average values of one tend to accompany below-average values of the other

### Correlation r

The correlation r measures the direction and strength of the linear relationship between two quantitative variables

### Regression line

A regression line is a line that describes how a response variable y changes as an explanatory variable x changes

We often use a regression line to predict the values of y for a given value of x

### Regression line, predicted value, slope, y intercept

Suppose that y is a response variable and x is an explanatory variable

A regression line relating y to x has an equation of the form

ŷ = a + bx

In this equation,

ŷ is the predicted value of the response variable y for a given value of the explanatory variable x

b is the slope, the amount by which y is predicted to change when x increases by one unit

a is the y intercept, the predicted value of y when x = 0

### Extrapolation

Extrapolation is the use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line

Such predictions are often not accurate

### Residual

A residual is the difference between an observed value of the response variable and the value predicted by the regression line

That is,

residual = observed y - predicted y

residual = y - ŷ

### Least-squares regression line

The least-squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible

### Equation of the least-squares regression line

We have data on an explanatory variable x and a response variable y for n individuals

From the data, calculate the means x̄ and y-bar and the standard deviations Sx and Sy of the two variables and their correlation r

The least-squares regression line is the line ŷ = a + bx with slope

b = r(Sy / Sx)

and y-intercept

a = y-bar - bx̄

### Residual plot

A residual plot is scatterplot of the residuals against the explanatory variable

Residual plots help us assess how well a regression line fits the data

### Standard deviation of the residuals (s)

If we use the least-squares line to predict the values of a response variable y from an explanatory variable x, the standard deviation of the residuals (s) is given by

s = square root(Σresiduals^2 / (n - 2)) = square root(Σ(Yi - ŷ)^2 / (n - 2))

This value gives the approximate size of a "typical" or "average" prediction error (residual)

### The coefficient of determination: r^2 in regression

The coefficient of determination r^2 is the fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x

We can calculate r^2 using the following formula:

r^2 = 1 - SSE / SST

where SSE = Σresidual^2 and SST = Σ(Yi - y-bar)^2

### Outliers in regression

An outlier is an observation that lies outside the overall pattern of the other observations

Points that are outliers in the y direction but not in the x direction of a scatterplot have large residuals

Other outliers may not have large residuals