Create an account
y - y hat, the difference between the actual y-value in a scatterplot and the predicted y-value in the LSRL
Mean and StanDev of a Binomial Random Variable
mean: mu of x = np
stan.dev.: sigma of x = sq.root (np(1-p))
Significance Test 4-step process
STATE: hypotheses, significance level, parameters
PLAN: choose inference method, check conditions
DO: perform calculations, compute test stat, find P-value
CONCLUDE: interpret the result of your test in the context of your problem
Chi-Square Test df and expected counts
Goodness of Fit: df = #categories - 1, exp counts = sample size x hypo. prop. in each category
Homogeneity/Independance: df = (#columns - 1)(#rows - 1), exp counts: (row total)(column total)/table total
interpreting a residual plot
curved pattern = linear may not be appropriate
small residuals = predictions will be pretty precise
inc/dec spread = predictions for larger/smaller values of x will be more variable
two mutually exclusive events whose union is the sample space (rain/not rain etc)
binomial distribution: calc usage
exactly x: binompdf(n, p, x)
at most x: binomcdf(n, p, x)
less than x: binomcdf(n, p, x-1)
at least x: 1 - binomcdf(n, p, x-1)
more than x: 1 - binomcdf(n, p, x)
CI 4 step process
STATE parameter, confidence level
PLAN inference method, check conditions
CONCLUDE interpret interval in the context of the problem
types of chi-square tests
Goodness of fit: test the distribution of one group of sample as compared to a hypothesized distribution
Homogeneity: when you have a sample from 2 or more indep. pop.s or 2 or more groups in an experiment
Association: when you have a single sample from a single population with 2 variables
calc tips: normalcdf and invnorm
normalcdf: min, max, mean, standev
invnorm: area to the left as a decimal, mean, standev
use of a control group
gives the researchers a comparison group to be used to evaluate the effectiveness of the treatments
conditions for geometric distributions
Binomial - can be classified as success/failure
Trials - goal must be to count #trials until the first success
Success - prob of success must be the same for each trial
2 sided test from a confidence interval
we do/don't have enough evidence to reject Ho: mu = __ in favor of Ha: mu (does not equal) __ at the a = 0.05 level because __ falls outside/inside the 95% CI.
a = 1 - CI
Conditions: inference for proportions
Random, Normal (at least 10 succ/failures in both groups if it's 2sample), Independent (10% condition)
a sample taken in such a way that every set of n individuals has an equal chance to be the sample selected
Conditions: binomial distribution
B: binomial (success/failure)
N: number (of trials must be fixed)
S: success (prob. of success must stay constant)
finding sample size given ME
one mean: m = z*(sigma/n)
proportion: m = z*sq.rt.(p(1-p)/n)
solve for n
mean/standev of a sum of 2 random variables
mean: mu of x+y = mu x + mu y
standev: sigma of x+y = sq.rt. (sigma x^2 + sigma y^2)
can we generalize the results to the population of interest?
Yes, if a large random sample was taken from the same population we hope to draw conclusions about.
factors that affect power
sample size (inc power = inc sample size), alpha (a 5% sig.test will have a greater chance of rejecting Ho than a 1% test), consider an alternative farther from mu 0 (values of mu that are in Ha but like close to the hyp. value are harder to detect than farther away ones)
adding a to every member of a data set adds a to the measures of position but does not change the measures of spread and shape, multiplying every member by b multiplies the measures of position by b and multiplies most measures of spread by b but does not change the shape
LSRL "SE of b"
measures the standev of the estimated slope for predicting the y-variable from the x-variable/how far the estimated slope will be from the true slope on average
experiment vs obs. study
a study is an experiment ONLY if researchers impose a treatment on the exp. units
mean and standev of a difference of 2 random variables
mean: mu of x-y = mu x - mu y
standev: sigma x-y = sq.rt. (sigma x^2 - sigma y^2)
assuming that the null is true, the _______ measures the chance of observing a statistic (or difference) as large or larger than the one observed
mean/standev of a discrete random variable
mean: mu x = sum (xp)
standev: sigma x = sq.rt. (sum (x-mu)p)
systematic favoring of certain outcomes due to flawed sample selection, poor wording, undercoverage, nonresponse, etc
2-sample t-test phrasing hints and Ho/Ha
key phrase: difference in the means!
Ho: mu1 - mu2 = 0 OR mu1 = mu2
Ha: mu1 - mu2 < 0, > 0, not equal to 0
2-sample t-test conclusion
we do/don't have enough evidence at the a = ? level to conclude that the difference between the mean ? for all ? and the mean ? for all ? is ?
measures spread by giving the typical or average distance that the observations are away from their mean
goal/benefit of blocking
goal: to create groups of homogeneous exp units
benefit: the reduction of the effect of variation within the exp units
the long-run average outcome of a random phenomenon carried out a very large number of times
data is collected in such a way that there is no systematic tendency to over- or under-estimate the true value of the pop. parameter
paired t-test: phrasing hints and Ho/Ha
key phrase: mean difference
Ho: mu diff = 0
Ha: mu diff < 0, > 0, not equal to 0
paired t-test: conclusion
we do/don't have enough evidence at the ? a level to conclude that the mean difference in ? for all ? is ?
Completely Randomized Design, Randomized Block Design (homogeneous blocks), Matched Pairs
Central Limit Theorem
if the pop. dist. is normal, the samp. dist will also be normal. also, as n increases, the samp. dist.'s standev will increase.
if the pop.dist. is not normal, the samp. dist. will become more and more normal as n increases.
Intervals prodced with this method will capture the true population/mean of ______ in about ___% of all possible samples of this same size from this same population.
Conditions: inference for regression
independent (10%), normal (regression), equal variance (around regression), random
LSRL slope 'b'
for every one unit change in x the y variable is predicted to increase/decrease by ____ units.
why large samples give more trustworthy results
large samples yield more precise results than small samples because in a large sample the calues of the sample statistic tend to be cloaser to the true population parameter
confidence interval conclusion
I am ___% confident that the interval from ___ to ___ captures the true _____.
Please allow access to your computer’s microphone to use Voice Recording.
Having trouble? Click here for help.
We can’t access your microphone!
Click the icon above to update your browser permissions and try again
Reload the page to try again!Reload
Press Cmd-0 to reset your zoom
Press Ctrl-0 to reset your zoom
It looks like your browser might be zoomed in or out. Your browser needs to be zoomed to a normal size to record audio.
Please upgrade Flash or install Chrome
to use Voice Recording.
For more help, see our troubleshooting page.
Your microphone is muted
For help fixing this issue, see this FAQ.
Star this term
You can study starred terms together