### residual

y - y hat, the difference between the actual y-value in a scatterplot and the predicted y-value in the LSRL

### Mean and StanDev of a Binomial Random Variable

mean: mu of x = np

stan.dev.: sigma of x = sq.root (np(1-p))

### Significance Test 4-step process

STATE: hypotheses, significance level, parameters

PLAN: choose inference method, check conditions

DO: perform calculations, compute test stat, find P-value

CONCLUDE: interpret the result of your test in the context of your problem

### Chi-Square Test df and expected counts

Goodness of Fit: df = #categories - 1, exp counts = sample size x hypo. prop. in each category

Homogeneity/Independance: df = (#columns - 1)(#rows - 1), exp counts: (row total)(column total)/table total

### interpreting a residual plot

curved pattern = linear may not be appropriate

small residuals = predictions will be pretty precise

inc/dec spread = predictions for larger/smaller values of x will be more variable

### complementary events

two mutually exclusive events whose union is the sample space (rain/not rain etc)

### binomial distribution: calc usage

exactly x: binompdf(n, p, x)

at most x: binomcdf(n, p, x)

less than x: binomcdf(n, p, x-1)

at least x: 1 - binomcdf(n, p, x-1)

more than x: 1 - binomcdf(n, p, x)

### CI 4 step process

STATE parameter, confidence level

PLAN inference method, check conditions

DO calculations

CONCLUDE interpret interval in the context of the problem

### types of chi-square tests

Goodness of fit: test the distribution of one group of sample as compared to a hypothesized distribution

Homogeneity: when you have a sample from 2 or more indep. pop.s or 2 or more groups in an experiment

Association: when you have a single sample from a single population with 2 variables

### calc tips: normalcdf and invnorm

normalcdf: min, max, mean, standev

invnorm: area to the left as a decimal, mean, standev

### use of a control group

gives the researchers a comparison group to be used to evaluate the effectiveness of the treatments

### conditions for geometric distributions

Binomial - can be classified as success/failure

Independent

Trials - goal must be to count #trials until the first success

Success - prob of success must be the same for each trial

### 2 sided test from a confidence interval

we do/don't have enough evidence to reject Ho: mu = __ in favor of Ha: mu (does not equal) __ at the a = 0.05 level because __ falls outside/inside the 95% CI.

a = 1 - CI

### Conditions: inference for proportions

Random, Normal (at least 10 succ/failures in both groups if it's 2sample), Independent (10% condition)

### SRS

a sample taken in such a way that every set of n individuals has an equal chance to be the sample selected

### Conditions: binomial distribution

B: binomial (success/failure)

I: independent

N: number (of trials must be fixed)

S: success (prob. of success must stay constant)

### finding sample size given ME

one mean: m = z*(sigma/n)

proportion: m = z*sq.rt.(p(1-p)/n)

solve for n

### mean/standev of a sum of 2 random variables

mean: mu of x+y = mu x + mu y

standev: sigma of x+y = sq.rt. (sigma x^2 + sigma y^2)

### can we generalize the results to the population of interest?

Yes, if a large random sample was taken from the same population we hope to draw conclusions about.

### factors that affect power

sample size (inc power = inc sample size), alpha (a 5% sig.test will have a greater chance of rejecting Ho than a 1% test), consider an alternative farther from mu 0 (values of mu that are in Ha but like close to the hyp. value are harder to detect than farther away ones)

### linear transformations

adding a to every member of a data set adds a to the measures of position but does not change the measures of spread and shape, multiplying every member by b multiplies the measures of position by b and multiplies most measures of spread by b but does not change the shape

### LSRL "SE of b"

measures the standev of the estimated slope for predicting the y-variable from the x-variable/how far the estimated slope will be from the true slope on average

### experiment vs obs. study

a study is an experiment ONLY if researchers impose a treatment on the exp. units

### mean and standev of a difference of 2 random variables

mean: mu of x-y = mu x - mu y

standev: sigma x-y = sq.rt. (sigma x^2 - sigma y^2)

### P-value

assuming that the null is true, the _______ measures the chance of observing a statistic (or difference) as large or larger than the one observed

### mean/standev of a discrete random variable

mean: mu x = sum (xp)

standev: sigma x = sq.rt. (sum (x-mu)p)

### bias

systematic favoring of certain outcomes due to flawed sample selection, poor wording, undercoverage, nonresponse, etc

### 2-sample t-test phrasing hints and Ho/Ha

key phrase: difference in the means!

Ho: mu1 - mu2 = 0 OR mu1 = mu2

Ha: mu1 - mu2 < 0, > 0, not equal to 0

### 2-sample t-test conclusion

we do/don't have enough evidence at the a = ? level to conclude that the difference between the mean ? for all ? and the mean ? for all ? is ?

### standev

measures spread by giving the typical or average distance that the observations are away from their mean

### goal/benefit of blocking

goal: to create groups of homogeneous exp units

benefit: the reduction of the effect of variation within the exp units

### expected value/mean

the long-run average outcome of a random phenomenon carried out a very large number of times

### unbiased estimator

data is collected in such a way that there is no systematic tendency to over- or under-estimate the true value of the pop. parameter

### paired t-test: phrasing hints and Ho/Ha

key phrase: mean difference

Ho: mu diff = 0

Ha: mu diff < 0, > 0, not equal to 0

### paired t-test: conclusion

we do/don't have enough evidence at the ? a level to conclude that the mean difference in ? for all ? is ?

### experimental designs

Completely Randomized Design, Randomized Block Design (homogeneous blocks), Matched Pairs

### Central Limit Theorem

if the pop. dist. is normal, the samp. dist will also be normal. also, as n increases, the samp. dist.'s standev will increase.

if the pop.dist. is not normal, the samp. dist. will become more and more normal as n increases.

### Confidence Interval

Intervals prodced with this method will capture the true population/mean of ______ in about ___% of all possible samples of this same size from this same population.

### Conditions: inference for regression

independent (10%), normal (regression), equal variance (around regression), random

### LSRL slope 'b'

for every one unit change in x the y variable is predicted to increase/decrease by ____ units.

### why large samples give more trustworthy results

large samples yield more precise results than small samples because in a large sample the calues of the sample statistic tend to be cloaser to the true population parameter

### confidence interval conclusion

I am ___% confident that the interval from ___ to ___ captures the true _____.