Search
Browse
Create
Log in
Sign up
Log in
Sign up
Upgrade to remove ads
Only $2.99/month
Correlation and Regression
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (39)
Which axis to the response and explanatory variables go on?
Y- Response
X - Explanatory
What does cut() do in a scatterplot?
breaks the results down into bar sections
A perfect correlation is symbolized by what number
1
The long tailed version of the most common type of correlation
Pearson product-moment correlation
function in ggplot to make a regression line and what are the two attributes?
geom_abline()
intercept - which is where the line meets the y axis and the slope
geom_abline(intercept = 0, slope = 1.7)
function in ggplot that automatically calculates the best fit regression line
geom_smooth
What is Galtons regression to the mean?
Overtime, repeated observations will tend to stray closer to the mean.
To find Y or the response variable, what is the calculation?
intercept added to the result of the slope times the explanatory variable.
response = intercept + (slope * explanatory)
term for the intercept
b sub 0
term for the slope
b sub 1
How can you find the slope?
the corelation of x and Y times the division the standard deviation of x or the standard deviation of Y
What is Y hat?
predicted value of Y.
What is e represent in the regression formula?
epsilon, or noise
What does the hat symbolize on variables...the beta hats?
Beta-hats are estimates of true, unknown betas
Symbol for residuals
e or the epislon
what is the formula for finding a residual?
Y hat minus Y
General statistical model
response = f(explanatory) + noise
General linear model
response = intercept + (slope * explanatory) + noise
What is the residual representative of and how would you calculate it?
Epsilon
In the least squares method, what does the sum of the residuals equal?
0
What class is a linear model?
lm, its own special class
What are fitted values?
A fitted value is simply another name for a predicted value as it describes where a particular x-value
What function from what package lets you see the calculations for all x and y predicted formulas?
augment from the broom package
If you want to try your own values in a formula, what would you use
the predict function where predict(lm,newdata)
lm is a linear model object and new data is a dataframe
What is SSE?
Sum of the Squared Residuals
Why would you need to square the residuals (RMSE) vs SSE?
because the residuals summed together should be 0
How would you calculate SSE?
1. The sum of the squared residuals sum(.resid^2)
2. The variance of the residuals times the number of observations minus 1
(n() - 1) * var(.resid))
What is the RMSE
residual standard square
What advantage does the RMSE have over the SSE?
Its in the same units as the response variable
What is Y bar?
The average
What is the model
a model where you use the average of Y (y bar) as the basis in your formula
You do this by putting 1 as the value in the lm function lm('y'~1, data=df)
What is the coefficient of determination?
r squared. It is the ratio of a model vs the ratio of the null model
What is SST?
total sum of the squares
What does the coefficient of determination point to?
the proportion of variability in response variable that is explained by our model
What is leverage in linear models?
Its the influence that any one particular X or explanatory variable has on the whole model.
1. Known in augment() as .hat
2. Computed as the distance of X from the mean of X
3. In general, the farther away X is from the mean of x, the more influence it has.
4. It is known in formulas as h sub i
5. Y doesn't matter
When is a value considered 'influential'?
when it has a high effect on the linear model. Its a combination of a high leverage AND high residual
What is Cooks distance and how is it displayed in augment()?
.cooksd, it is a combination of leverage (x distance from the mean) and y, the values distance from the linear model line, also known as the residual
Two methods for calculating SSE?
sum(.resid^2),
SSE_also = (n() - 1) * var(.resid))
In augment what is the column that has the leverage variable?
.hat
THIS SET IS OFTEN IN FOLDERS WITH...
Datacamp: Intro to SQL and Joins with PostgreSQL
61 terms
Data manipulation with dplyr
9 terms
Datacamp intermediate r
29 terms
String Manipulation Commands in R
18 terms
YOU MIGHT ALSO LIKE...
Chapter 4
18 terms
Unit 2 vocab
22 terms
Chapter 7-10
24 terms
AP Stats Vocab (7,8,9,10)
23 terms
OTHER SETS BY THIS CREATOR
Cluster Analysis
28 terms
Supervised Learning in R
27 terms
Dynamic Yield
18 terms
Exploratory Analysis R
23 terms