Stat 212 - Ch. 3, Scatterplots and Correlation

Created by RheaWong 

Upgrade to
remove ads

- Explanatory and response variables. - Displaying relationships: scatterplots. - Interpreting scatterplots. - Adding categotical variables to scatterplots. - Measuring linear association: correlation. - Facts about correlation.

Scatterplot

A graph used to display the relationship between two quantitative variables measured on the same individuals.

Response Variable
Dependent Variable

Measures or records an outcome of a study.

Explanatory Variable
Independent Variable

May explain or influence changes in a response variable.

Form

A way of describing a scatterplot relationship.
Linear, curved, clusters, or no pattern.

Direction

A way of describing a scatterplot relationship.
Positive, negative, no direction.

When examining a graph, examine...

Look for the overall pattern and for striking deviations.

Strength

A way of describing a scatterplot relationship.
How closely the points fit the "form"; weak, strong.

Positive Association

High values of one variable tend to occur together with high values of the other variable.

Negative Association

High values of one variable tend to occur together with low values of the other variable.

Relationships between categorical data

Not possible; relationships rely on quantitative variables.

Scatter

Variation, as in, variation of data points around the main graph. Used to measure strength.

Adding categorical data to scatterplots

Use points with different shapes/colors.

Outliers

Not on the general line "drawn" for the scatterplot; if on the line, not actually an outlier.

Correlation Coefficient (definition)

A measure of the direction and strength of a relationship; calculated using the mean and standard deviation of both the X and Y variables. Given as "r".

Correlation Coefficient (equation)

r = (1 / n-1) ∑ (x₋i - x₋bar/ s₋x)(y₋i - y₋bar/ s₋y)

x₋bar = mean for the dependent variable
y₋bar = mean for the independent variable
s₋x = standard deviation for the dependent variable
s₋y = standard deviation for the independent variable

Correlation makes no distinction between explanatory and response variables.

It doesn't matter what's called what in calculating the correlation.

r does not change when we change the units of measurement of x, y, or both

In calculating correlation, elements are standardized; standardizing eliminates units.

The correlation r is always a number between -1 and 1.

Negative indicates negative correlation, positive indicates positive; zero indicates no correlation.

Correlation only works for linear relationships

Curved, etc. are moot.

Correlation and resistence

Correlation is not resistent, and can be affected by outliers.

Correlation from averaged data

Correlation calculated from averaged data is typically much stronger than correlation calculated from raw data points because averaging reduces some scatter.

Please allow access to your computer’s microphone to use Voice Recording.

Having trouble? Click here for help.

We can’t access your microphone!

Click the icon above to update your browser permissions above and try again

Example:

Reload the page to try again!

Reload

Press Cmd-0 to reset your zoom

Press Ctrl-0 to reset your zoom

It looks like your browser might be zoomed in or out. Your browser needs to be zoomed to a normal size to record audio.

Please upgrade Flash or install Chrome
to use Voice Recording.

For more help, see our troubleshooting page.

Your microphone is muted

For help fixing this issue, see this FAQ.

Star this term

You can study starred terms together

NEW! Voice Recording

Create Set