# AP Stats

### 63 terms by DaniTaiga

#### Study  only

Flashcards Flashcards

Scatter Scatter

Scatter Scatter

## Create a new folder

### outlier

any value that falls more than 1.5 IQR above Q3 or below Q1

### residual

y - y hat, the difference between the actual y-value in a scatterplot and the predicted y-value in the LSRL

= 1 - P(none)

### Mean and StanDev of a Binomial Random Variable

mean: mu of x = np
stan.dev.: sigma of x = sq.root (np(1-p))

### Significance Test 4-step process

STATE: hypotheses, significance level, parameters
PLAN: choose inference method, check conditions
DO: perform calculations, compute test stat, find P-value
CONCLUDE: interpret the result of your test in the context of your problem

### Chi-Square Test df and expected counts

Goodness of Fit: df = #categories - 1, exp counts = sample size x hypo. prop. in each category
Homogeneity/Independance: df = (#columns - 1)(#rows - 1), exp counts: (row total)(column total)/table total

### z-score equation

z = (value - mean) / stan.dev.

### interpreting a residual plot

curved pattern = linear may not be appropriate
small residuals = predictions will be pretty precise
inc/dec spread = predictions for larger/smaller values of x will be more variable

### complementary events

two mutually exclusive events whose union is the sample space (rain/not rain etc)

### binomial distribution: calc usage

exactly x: binompdf(n, p, x)
at most x: binomcdf(n, p, x)
less than x: binomcdf(n, p, x-1)
at least x: 1 - binomcdf(n, p, x-1)
more than x: 1 - binomcdf(n, p, x)

### CI 4 step process

STATE parameter, confidence level
PLAN inference method, check conditions
DO calculations
CONCLUDE interpret interval in the context of the problem

### types of chi-square tests

Goodness of fit: test the distribution of one group of sample as compared to a hypothesized distribution
Homogeneity: when you have a sample from 2 or more indep. pop.s or 2 or more groups in an experiment
Association: when you have a single sample from a single population with 2 variables

### calc tips: normalcdf and invnorm

normalcdf: min, max, mean, standev
invnorm: area to the left as a decimal, mean, standev

### extrapolation

using an LSRL to predict outside the domain of the explanatory variable

### use of a control group

gives the researchers a comparison group to be used to evaluate the effectiveness of the treatments

### conditions for geometric distributions

Binomial - can be classified as success/failure
Independent
Trials - goal must be to count #trials until the first success
Success - prob of success must be the same for each trial

### 2 sided test from a confidence interval

we do/don't have enough evidence to reject Ho: mu = __ in favor of Ha: mu (does not equal) __ at the a = 0.05 level because __ falls outside/inside the 95% CI.
a = 1 - CI

### Conditions: inference for proportions

Random, Normal (at least 10 succ/failures in both groups if it's 2sample), Independent (10% condition)

### LSRL y-hat

the estimated or predicted y-value (context) for a given x-value (context)

### SRS

a sample taken in such a way that every set of n individuals has an equal chance to be the sample selected

### Conditions: binomial distribution

B: binomial (success/failure)
I: independent
N: number (of trials must be fixed)
S: success (prob. of success must stay constant)

### finding sample size given ME

one mean: m = z*(sigma/n)
proportion: m = z*sq.rt.(p(1-p)/n)
solve for n

### Conditions: inference for means

SRS, Normal (more than 30), Indep. (10%)

### Describe/Compare Distributions

SOCS, 'is greater/less than' (compare)

### LSRL

the equation for the line that creates the least residuals

### Does x CAUSE y?

association is not causation!

### mean/standev of a sum of 2 random variables

mean: mu of x+y = mu x + mu y
standev: sigma of x+y = sq.rt. (sigma x^2 + sigma y^2)

### can we generalize the results to the population of interest?

Yes, if a large random sample was taken from the same population we hope to draw conclusions about.

### factors that affect power

sample size (inc power = inc sample size), alpha (a 5% sig.test will have a greater chance of rejecting Ho than a 1% test), consider an alternative farther from mu 0 (values of mu that are in Ha but like close to the hyp. value are harder to detect than farther away ones)

### linear transformations

adding a to every member of a data set adds a to the measures of position but does not change the measures of spread and shape, multiplying every member by b multiplies the measures of position by b and multiplies most measures of spread by b but does not change the shape

### LSRL "SE of b"

measures the standev of the estimated slope for predicting the y-variable from the x-variable/how far the estimated slope will be from the true slope on average

### experiment vs obs. study

a study is an experiment ONLY if researchers impose a treatment on the exp. units

### mean and standev of a difference of 2 random variables

mean: mu of x-y = mu x - mu y
standev: sigma x-y = sq.rt. (sigma x^2 - sigma y^2)

### P-value

assuming that the null is true, the _______ measures the chance of observing a statistic (or difference) as large or larger than the one observed

### type 1 error

rejecting Ho when Ho is actually true

### type 2 error

failing to reject Ho when Ho should be rejected

### Power

rejecting Ho when Ho should be rejected

### r

correlation measures the strength and direction of the linear relationship between x and y

### stratified random sample vs SRS

stratified guarantees that each of the strata will be represented

### mean/standev of a discrete random variable

mean: mu x = sum (xp)
standev: sigma x = sq.rt. (sum (x-mu)p)

### bias

systematic favoring of certain outcomes due to flawed sample selection, poor wording, undercoverage, nonresponse, etc

### 2-sample t-test phrasing hints and Ho/Ha

key phrase: difference in the means!
Ho: mu1 - mu2 = 0 OR mu1 = mu2
Ha: mu1 - mu2 < 0, > 0, not equal to 0

### 2-sample t-test conclusion

we do/don't have enough evidence at the a = ? level to conclude that the difference between the mean ? for all ? and the mean ? for all ? is ?

### standev

measures spread by giving the typical or average distance that the observations are away from their mean

### r-squared

% or the variation in y that is explained by the LSRL of y on x

### goal/benefit of blocking

goal: to create groups of homogeneous exp units
benefit: the reduction of the effect of variation within the exp units

### expected value/mean

the long-run average outcome of a random phenomenon carried out a very large number of times

### unbiased estimator

data is collected in such a way that there is no systematic tendency to over- or under-estimate the true value of the pop. parameter

### paired t-test: phrasing hints and Ho/Ha

key phrase: mean difference
Ho: mu diff = 0
Ha: mu diff < 0, > 0, not equal to 0

### paired t-test: conclusion

we do/don't have enough evidence at the ? a level to conclude that the mean difference in ? for all ? is ?

### LSRL y-intercept 'a'

when the x variable is 0, the y variable is estimated to be ______.

### experimental designs

Completely Randomized Design, Randomized Block Design (homogeneous blocks), Matched Pairs

### probability

the proportion of times the outcome would occur in a long series of repetitions

### Central Limit Theorem

if the pop. dist. is normal, the samp. dist will also be normal. also, as n increases, the samp. dist.'s standev will increase.
if the pop.dist. is not normal, the samp. dist. will become more and more normal as n increases.

### Confidence Interval

Intervals prodced with this method will capture the true population/mean of ______ in about ___% of all possible samples of this same size from this same population.

### Conditions: inference for regression

independent (10%), normal (regression), equal variance (around regression), random

### LSRL slope 'b'

for every one unit change in x the y variable is predicted to increase/decrease by ____ units.

### sampling techniques

SRS, stratified, cluster, census, convenience, voluntary

### two events are independent if...

P(B) = P(BIA) or P(B) = P(BIAc)

### why large samples give more trustworthy results

large samples yield more precise results than small samples because in a large sample the calues of the sample statistic tend to be cloaser to the true population parameter

### confidence interval conclusion

I am ___% confident that the interval from ___ to ___ captures the true _____.

### Conditions: chi-squared tests

random, all exp. counts at least 5, independent (10%)

Example: