How can we help?

You can also find more resources in our Help Center.

63 terms

AP Stats

STUDY
PLAY
outlier
any value that falls more than 1.5 IQR above Q3 or below Q1
residual
y - y hat, the difference between the actual y-value in a scatterplot and the predicted y-value in the LSRL
P(at least 1)
= 1 - P(none)
Mean and StanDev of a Binomial Random Variable
mean: mu of x = np
stan.dev.: sigma of x = sq.root (np(1-p))
Significance Test 4-step process
STATE: hypotheses, significance level, parameters
PLAN: choose inference method, check conditions
DO: perform calculations, compute test stat, find P-value
CONCLUDE: interpret the result of your test in the context of your problem
Chi-Square Test df and expected counts
Goodness of Fit: df = #categories - 1, exp counts = sample size x hypo. prop. in each category
Homogeneity/Independance: df = (#columns - 1)(#rows - 1), exp counts: (row total)(column total)/table total
z-score equation
z = (value - mean) / stan.dev.
interpreting a residual plot
curved pattern = linear may not be appropriate
small residuals = predictions will be pretty precise
inc/dec spread = predictions for larger/smaller values of x will be more variable
complementary events
two mutually exclusive events whose union is the sample space (rain/not rain etc)
binomial distribution: calc usage
exactly x: binompdf(n, p, x)
at most x: binomcdf(n, p, x)
less than x: binomcdf(n, p, x-1)
at least x: 1 - binomcdf(n, p, x-1)
more than x: 1 - binomcdf(n, p, x)
CI 4 step process
STATE parameter, confidence level
PLAN inference method, check conditions
DO calculations
CONCLUDE interpret interval in the context of the problem
types of chi-square tests
Goodness of fit: test the distribution of one group of sample as compared to a hypothesized distribution
Homogeneity: when you have a sample from 2 or more indep. pop.s or 2 or more groups in an experiment
Association: when you have a single sample from a single population with 2 variables
calc tips: normalcdf and invnorm
normalcdf: min, max, mean, standev
invnorm: area to the left as a decimal, mean, standev
extrapolation
using an LSRL to predict outside the domain of the explanatory variable
use of a control group
gives the researchers a comparison group to be used to evaluate the effectiveness of the treatments
conditions for geometric distributions
Binomial - can be classified as success/failure
Independent
Trials - goal must be to count #trials until the first success
Success - prob of success must be the same for each trial
2 sided test from a confidence interval
we do/don't have enough evidence to reject Ho: mu = __ in favor of Ha: mu (does not equal) __ at the a = 0.05 level because __ falls outside/inside the 95% CI.
a = 1 - CI
Conditions: inference for proportions
Random, Normal (at least 10 succ/failures in both groups if it's 2sample), Independent (10% condition)
SOCS
Shape, Outliers, Center, Spread
LSRL y-hat
the estimated or predicted y-value (context) for a given x-value (context)
SRS
a sample taken in such a way that every set of n individuals has an equal chance to be the sample selected
Conditions: binomial distribution
B: binomial (success/failure)
I: independent
N: number (of trials must be fixed)
S: success (prob. of success must stay constant)
finding sample size given ME
one mean: m = z*(sigma/n)
proportion: m = z*sq.rt.(p(1-p)/n)
solve for n
Conditions: inference for means
SRS, Normal (more than 30), Indep. (10%)
Describe/Compare Distributions
SOCS, 'is greater/less than' (compare)
LSRL
the equation for the line that creates the least residuals
Does x CAUSE y?
association is not causation!
mean/standev of a sum of 2 random variables
mean: mu of x+y = mu x + mu y
standev: sigma of x+y = sq.rt. (sigma x^2 + sigma y^2)
can we generalize the results to the population of interest?
Yes, if a large random sample was taken from the same population we hope to draw conclusions about.
factors that affect power
sample size (inc power = inc sample size), alpha (a 5% sig.test will have a greater chance of rejecting Ho than a 1% test), consider an alternative farther from mu 0 (values of mu that are in Ha but like close to the hyp. value are harder to detect than farther away ones)
linear transformations
adding a to every member of a data set adds a to the measures of position but does not change the measures of spread and shape, multiplying every member by b multiplies the measures of position by b and multiplies most measures of spread by b but does not change the shape
LSRL "SE of b"
measures the standev of the estimated slope for predicting the y-variable from the x-variable/how far the estimated slope will be from the true slope on average
experiment vs obs. study
a study is an experiment ONLY if researchers impose a treatment on the exp. units
mean and standev of a difference of 2 random variables
mean: mu of x-y = mu x - mu y
standev: sigma x-y = sq.rt. (sigma x^2 - sigma y^2)
P-value
assuming that the null is true, the _______ measures the chance of observing a statistic (or difference) as large or larger than the one observed
type 1 error
rejecting Ho when Ho is actually true
type 2 error
failing to reject Ho when Ho should be rejected
Power
rejecting Ho when Ho should be rejected
r
correlation measures the strength and direction of the linear relationship between x and y
stratified random sample vs SRS
stratified guarantees that each of the strata will be represented
mean/standev of a discrete random variable
mean: mu x = sum (xp)
standev: sigma x = sq.rt. (sum (x-mu)p)
bias
systematic favoring of certain outcomes due to flawed sample selection, poor wording, undercoverage, nonresponse, etc
2-sample t-test phrasing hints and Ho/Ha
key phrase: difference in the means!
Ho: mu1 - mu2 = 0 OR mu1 = mu2
Ha: mu1 - mu2 < 0, > 0, not equal to 0
2-sample t-test conclusion
we do/don't have enough evidence at the a = ? level to conclude that the difference between the mean ? for all ? and the mean ? for all ? is ?
standev
measures spread by giving the typical or average distance that the observations are away from their mean
r-squared
% or the variation in y that is explained by the LSRL of y on x
goal/benefit of blocking
goal: to create groups of homogeneous exp units
benefit: the reduction of the effect of variation within the exp units
expected value/mean
the long-run average outcome of a random phenomenon carried out a very large number of times
unbiased estimator
data is collected in such a way that there is no systematic tendency to over- or under-estimate the true value of the pop. parameter
paired t-test: phrasing hints and Ho/Ha
key phrase: mean difference
Ho: mu diff = 0
Ha: mu diff < 0, > 0, not equal to 0
paired t-test: conclusion
we do/don't have enough evidence at the ? a level to conclude that the mean difference in ? for all ? is ?
LSRL y-intercept 'a'
when the x variable is 0, the y variable is estimated to be ______.
experimental designs
Completely Randomized Design, Randomized Block Design (homogeneous blocks), Matched Pairs
probability
the proportion of times the outcome would occur in a long series of repetitions
Central Limit Theorem
if the pop. dist. is normal, the samp. dist will also be normal. also, as n increases, the samp. dist.'s standev will increase.
if the pop.dist. is not normal, the samp. dist. will become more and more normal as n increases.
Confidence Interval
Intervals prodced with this method will capture the true population/mean of ______ in about ___% of all possible samples of this same size from this same population.
Conditions: inference for regression
independent (10%), normal (regression), equal variance (around regression), random
LSRL slope 'b'
for every one unit change in x the y variable is predicted to increase/decrease by ____ units.
sampling techniques
SRS, stratified, cluster, census, convenience, voluntary
two events are independent if...
P(B) = P(BIA) or P(B) = P(BIAc)
why large samples give more trustworthy results
large samples yield more precise results than small samples because in a large sample the calues of the sample statistic tend to be cloaser to the true population parameter
confidence interval conclusion
I am ___% confident that the interval from ___ to ___ captures the true _____.
Conditions: chi-squared tests
random, all exp. counts at least 5, independent (10%)