Terms in this set (18)
what does a scatterplot show?
a linear relationship between a quantitative explanatory variable x and a quantitative response variable y
what can we use to predict y for a given value of x?
the least-squares line fitted to the data
population regression line
true regression line - has all observations
sample regression line
estimated regression line - based on LSR line of a sample
what do the statistics a and b stand for?
of a least squares live, a is the intercept and b is the slope. they are calculated from sample data and would differ if we repeated the data production process. a is an estimate of the population intercept α, and b is an estimate of the parameter, β.
what are the conditions for regression inference and how do you check them?
LINEAR: the (true) relationship between x and y is linear - check residuals and scatterplot
INDEPENDENT: individual observations are independent of each other - check 10% rule if sampling without replacement/ when experimenting, check that results are independent
NORMAL: make stemplot/histogram/normal probability plot of RESIDUALS and check for clear skewness or other major departures from normality
EQUAL VARIANCE: look at scatterplot of RESIDUALS above and below the residual=0 line in the residual plot. the amount of scatter should be roughly the same from the smallest to largest x value
RANDOM: data come from a well designed random sample or randomized experiment
if we calculate the least squares regression line, are the statistics a and b unbiased estimators of the parameters α and β?
yes
what does population standard deviation, σ, mean?
σ describes the variability of the response y about the population regression line and is estimated by the standard deviation of the residuals
s=sqrt(Σ(residual)^2/n-2)
what do residuals of the LSRL computed from sample data estimate?
they estimate how much y varies about the population line.
how do we estimate the spread of the sampling distribution?
with the standard error of the slope -> SEb=s/Sx(sqrt(n-1)) whick has n-2 degrees of freedom
how do σ and SEb differ?
σ=variability of y about the population regression line
SEb=spread of sampling distribution
confidence interval for the slope β of the population
statistic ± critical value (standard deviation of statistic)
b ± t*SEb (t interval for slope)
test statistic formula for the slope of a LSR line
t=b-βo/SEb
how do you find the p value for a t test?
with df=n-2, you can use the t distribution/table
what do hypotheses look like for t test of slope?
Ho: β=hypothesized value
Ha: β>/</not equal to the hypothesized value
what is transforming data?
applying a function such as a square root or logarithm to a quantitative variable
using _________ is a much more efficient method for "linearizing" a curved pattern in a scatterplot
LOGARITHMS
once you get the y-hat value, what do you do with it if using logarithms
take e^value
