Search
Create
Log in
Sign up
Log in
Sign up
AP Stats Flash Cards for the Year
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (60)
Interpret Standard Deviation
Standard Deviation Measures spread by giving the "typical" or "average" distance that the observations (context) are away from their (context) mean
Outlier Rule
Upper Bound = Q3 + 1.5(IQR)
Lower Bound = Q1 - 1.5(IQR)
IQR = Q3 - Q1
Linear Transformations
Adding "a" to every member of a data set adds "a" to the measures of position, but does not change the measures of spread or the shape
Describe the Distributions
or
Compare the Distributions
Shape, Outlier, Center, Spread
Only discuss outliers if there are obviously outliers present. Be sure to address SCS in context!
If it says "Compare"
YOU MUST USE comparison phrases like "is greater than" or "is less than" for Center & Spread
SOCS
Shape - Skewed Left (Mean<Median); Skewed Right (Mean>Median); Fairly Symmetric (Mean≈Median)
Outliers - Discuss them if there are obvious ones
Center - Mean or Median
Spread - Range, IQR, or Standard Deviation
Note: Also be on the lookout for gaps, clusters or other unusual features of the data set. Make observations!
Using Normalcdf and Invnorm
(Calculator Tips)
Normalcdf (min, max, mean, standard deviation)
Invnorm (area to the left as a decimal, mean, standard deviation)
Interpret a z-score
(value-mean)/(standard deviation)
A z-score describes how many standard deviations a value or statistic (x, x bar, p hat, etc.) falls away from the mean of the distribution and in what direction. The further the z-score is away from zero the more "surprising" the value of the statistic is.
What is an Outlier?
When given 1 variable data:
An outlier is any value that falls more than 1.5(IQR) above Q3 or below Q1
Regression Outlier:
Any value that falls outside the pattern of the rest of the data.
Interpret LSRL Slope "b"
For every one unit change in the x variable (context) the y variable (context) is predicted to increase/decrease by _____ units (context).
Interpret LSRL y-intercept "a"
When the x variable (context) is zero, the y variable (context) is estimated to be (put value here).
Paired t-test
Phrasing Hints,
Ho and Ha,
Conclusion
Key Phrase: Mean Difference
Ho: μDiff=0
Ha: μDiff<0, >0, ≠0
μDiff = The mean difference in ______ for all ______.
We do/(do not) have enough evidence at the 0.05 level to conclude that the mean difference in _____ for all _____ is _____.
Two sample t-test
Phrasing Hints,
Ho and Ha,
conclusion
Key Phrase: DIFFERENCE IN MEANS
Ho: μ1-μ2=0 OR μ1=μ2
Ha: μ1-μ2<0, >0, ≠0
μ1-μ2= The difference between the mean ______ for all ______ and the mean _____ for all ______.
We do/(do not) have enough evidence at the 0.05 level to conclude that the difference between the mean _____ for all ______ and the mean ______ for all _____ is ______.
Type I Error,
Type II Error,
& Power
1. Type I Error: Rejecting Ho when Ho is actually true. (Ex. Convicting an innocent person)
2. Type II Error: Failing to (II) reject Ho when Ho should be rejected. (Ex. Letting a guilty person go free)
3. Power: Probability of rejecting Ho when Ho should be rejected. (Rejecting Correctly)
Factors that affect Power
1. Sample Size: To increase power, increase sample size.
2. Increase a: A 5% test of significance will have a greater change of rejecting the null than a 1% test.
3. Consider an alternative that is farther away from μ0: values of μ that are in Ha, but lie close to the hypothesized value that are harder to detect than values of μ that are far from μ0.
Inference for Means (conditions)
Random: Data from a random sample(s) or randomized experiment
Independent: Independent observations and independent samples/groups; 10% condition if sampling without replacement.
Normal: Population distribution is normal or large sample(s) (n1≥30 or n1≥30 and n2≥30)
Inference for Proportions (conditions)
Random: Data from a random sample(s) or randomized experiment
Independent: Independent observations and independent samples/groups; 10% condition if sampling without replacement.
Normal: At least 10 successes and failures (in both groups, for a two sample problem)
Types of Chi-Square Tests
1. Goodness of Fit: Use to test the distribution of one group or sample as compared to a hypothesized distribution.
2. Homogeneity: Use when you have a sample from 2 or more independent populations or 2 or more groups in an experiment. Each individual must be classified based upon a single categorical variable.
3. Association/Independence: Use when you have a single sample from a single population. Individuals in the sample are classified by two categorical variables.
Chi-Square Tests
df and Expected Counts
1. Goodness of Fit:
df = # of categories - 1
Expected Counts: Sample size times hypothesized proportion in each category.
2. Homogeneity or Association/Independence:
df = (# of rows - 1)(# of columns - 1)
Expected Counts: (row total)(column total)/(table total)
Inference for Counts
(Chi-Squared Tests)
(Conditions)
Random: Data from a random sample(s) or randomized experiment
Large Sample Size: All expected counts are at least 5
Independent: Independent observations and independent samples/groups; 10% condition if sampling without replacement
Inference for Regression (conditions)
Linear: True relationship between the variable is linear.
Independent Observations: 10% condition if sampling without replacement
Normal: Responses vary normally around the regression line for all x-values
Equal Variance: around the regression line for all x-values
Random: Data from a random sample or randomized experiment
Goals of blocking / Benefits of blocking
The goal of blocking is to create groups of homogeneous experimental units.
The benefits of blocking is the reduction of the effect of variation within the experimental units. (context)
Advantage of using a stratified random sample over SRS.
Stratified random sampling guarantees that each of the strata will be represented. When strata are chosen properly, it will produce better ( less variable/more precise) information than a SRS of same sample size.
Experimental or Observational study?
A study is only experimental ONLY if researchers IMPOSE a treatment upon the experimental units.
Does ___ cause ____?
Association IS NOT Causation!
an observed association, no matter the strength, does not indicate causation. Only a well designed, controlled experiment can lead to conclusion of cause and effect.
SRS
Simple random sample (SRS) is a sample taken in such a way that ever set of n individuals has an equal chance to be selected for the sample.
Why use a control group.
Control group gives the researcher a comparison group to evaluate the effectiveness of the treatment(s).
Complementary events
2 mutually exclusive events whose union is the sample space.
EX:
rain/no rain ,
draw at least one hearts/ draw no hearts
P(at least one)
P(at least one) = 1-P(none)
EX. P(at least 1 six in 3 rolls) = 1-P(5/6)^3 = 0.4213
Two events are independent IF...
the probability of event A occurring does not affect the probability of event B
Interpreting probability
The probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions. Probability is a long term relative frequency.
Interpret r²
______% od the variation in y (context) is a accounted for by the LSRL of y (context) on x (context).
or
______% of the variation in y (context) is accounted for by using the linear regression model with x (context) as the explanatory variable
Interpret r
Correlation measures the strength and direction of the linear relationship between x and y
- r is always between -1 and 1
-close to zero= very weak
-close to 1 or -1 = stronger
-Exactly 1 or -1 = perfectly straight line
-Positive r =positive correlation
-Negative r = negative correlation
Interpret LSRL ¨SEb¨
SEb measures the standard deviation of the estimated slope for predicting the y variable (context) from the x variable (context)
SEb measures how far the estimated slope will be from the true slope, on average
Interpret LSRL ¨s¨
S= _______ is the standard deviation of the residuals.
It measures the typical distance between the actual y-values (context) and their predicted y-values (context)
Interpret LSRL ¨y-hat¨
y-hat is the ¨estimated¨ or ¨predicted¨ y-value (context) for a given x-value (context)
Extrapolation
Using a LSRL to predict outside the domain of the explanatory variable.
( Can lead to ridiculous conclusions if the current linear trend does not continue)
Interpreting a Residual Plot
1. Is there a curved pattern?
IF so, a linear model may not be appropriate.
2. Are the residuals small in size?
IF so, predictions using the linear model will be fairly precise.
3. Is there increasing (or decreasing) spread?
IF so, prediction for larger (smaller) values of x will be more variable.
What is a Residual?
Residual= y-y-hat
A residua measures the difference between the actual (observed) y-valued in a scatterplot and the y-value that is predicted by the LSRL using its corresponding x value.
IN CALCULATOR= L3= L2-L1
Sampling Techniques
1. SRS- number of the entire population, draw numbers from a hat (every set of n individuals has equal chance of selection)
2. Stratified- Split the population into homogeneous groups, select an SRS from each group.
3. Cluster- Split the population into heterogeneous groups called clusters, and randomly select whole clusters for the sample. Ex. Choosing a cartoon of eggs actually chooses a cluster (group) of 12 egg
4. Census- An attempt to reach the entire population
5. Convenience- Select individuals eas=iest to reach
6. Voluntary Response- People choose themselves by responding to a general appel
Experimental Design
1. CRC ( Completely Randomized Design)- All experimental units are allocated at random among all treatments
2. RBD ( Randomized Block Design)- Experimental units are put into homogeneous blocks. The random assignment of the unit to the treatments is carried out separately within each blocks.
3. Matched Pairs- A form of blocking in which each subject receives both treatments in a random order or the subjects are matched in pairs as closely as possible and one subject in each pairs receives each treatment, determined at random
Interpreting Expected Value/Mean
The mean/expected value of a random variable is the long-run average outcome of a random phenomenon carried out a very large number of times.
Mean and Standard Deviation of a Discrete Random Variable
(also on the formula sheet)
Mean (Expected Value):
μx=∑(xi)(pi)
(Multiply &add across the table)
Standard Deviation:
σX=√(∑(xi-μx)pi)
Square root of the sum of (Each x value-the mean)²(its probability)
Mean and Standard Deviation of a Difference of Two Random Variables
Mean of a difference of 2 RV's:
μX-Y=μX-μY
Stdev of a Difference of 2 Indep RV's:
σX-Y=√((σx)²+(σY)²)
Stdev of a Difference of 2 Dependent RV's
Cannot be determined because it depends strongly they are correlated.
Mean and Standard Deviation of a Sum of Two Random Variables
Mean of a sum of 2 RV's:
μX+Y=μX+μY
Stdev of a Difference of 2 Indep RV's:
σX+Y=√((σx)²+(σY)²)
Stdev of a Sum of 2 Dependent RV's
Cannot be determined because it depends strongly they are correlated.
Binomial Distribution (Conditions)
1. Binary? Trials can be classified as success/failure
2. Independent? Trials must be independent
3. Number? The number of trials (n) must be fixed in advance
4. Success? The probability of success (p) must be the same for each trial
Geometric Distribution (Conditions)
1. Binary? Trials can be classified as success/failure
2. Independent? Trials must be independent
3. Trials?The goal is to count the number of trials until the first success occurs
4. Success? The probability of success (p) must be the same for each trial
Binomial Distribution (Calculator Usage)
Exactly 5: P(X=5)= Binompdf(n,p,5)
At Most 5: P(X≤5)= Binomcdf(n,p,5)
Less Than 5: P(X<5)= Binomcdf(n,p,4)
At Least 5: P(X≥5)= 1-Binomcdf(n,p,4)
More Than 5: P(X>5)= 1-Binomcdf(n,p,5)
Remember to define X, n, and p!
Mean and Standard Deviation of a Binomial Random Variable
(also on the formula sheet!)
Mean: μx=np
Standard Deviation: σx=√(np(1-p))
Why Large Samples Give More Trustworthy Results...
(when collected appropriately)
When collected appropriately, large samples yield more precise results than small samples because in a large sample the values of the sample statistic tend to be closer to the true population parameter.
The Sampling Distribution of the Sample Mean (Central Limit Theorem)
1. If the population distribution is Normal the sampling distribution will also be Normal with the same mean as the population. Additionally, as n increases the sampling distribution's standard deviation will decrease.
2. If the population distribution is not Normal the sampling distribution will become more and more Normal as n increases. The sampling distribution will have the same mean as the population and as n increases the sampling distribution's standard deviation will decrease.
Unbiased Estimator
The data is collected in such a way that there is no systematic tendency to overestimate or underestimate the true value of the population parameter.
(The mean of the sampling distribution equals the true value of the parameter being estimated)
Bias
The systematic favoring of certain outcomes due to flawed sample selection, poor question wording, under coverage, non response, etc.
Explain a P-value
Assuming that the null is true (context) the P-value measures the chance of observing a statistic (or difference in statistics) (context) as large as or larger than the one actually observed.
Can we generalize the results to the population of interest?
Yes, if:
A large random sample was taken from the same population we hope to draw conclusions about.
Finding the Sample Size
(For a given margin of error)
We do/(do not) have enough evidence to reject Ho: μ=? in favor of Ha: μ≠? at the a=0.05 level because ? falls outside/(inside) the 95% Confidence Interval.
a=1-confidence level
Carrying out a Two-Sided Test from a Confidence Interval
For one mean: m=z*(σ/√(n))
For one proportion: m=z*√((p(1-p))/n)
If an estimation of p is not given, use 0.5 for p. Solve for n.
4-Step Process
Confidence Intervals
STATE: what parameter do you want to estimate, and at what confidence level?
PLAN: Choose the appropriate inference method. Check conditions.
DO: If the conditions are met, perform calculations.
CONCLUDE: Interpret your interval in the context of the problem.
4-Step Process
Significance Tests
STATE: What hypotheses do you want to test, and at what significance level? Define any parameters you use.
PLAN: Choose the appropriate inference method. Check conditions.
DO: If the conditions are met, perform calculations. Complete the test statistic and find the P-value.
CONCLUDE: Interpret the result of your test in the context of the problem.
Interpreting a Confidence Interval
(Not a Confidence Level)
I am ____% confident that the interval from _____ to _____ captures the true ______.
Interpreting a Confidence Level
(The Meaning of 95% Confident)
Intervals produced with this method will capture the true population ______ in about 95% of all possible samples of this same size from this same population.
YOU MIGHT ALSO LIKE...
ASCP MLT/MLS Certification Exam (BOC) Preparation
scottmooredo
$6.99
STUDY GUIDE
AP Statistics flash cards
60 Terms
Esmeralda_Nieves61
AP Stats Flashcards
60 Terms
andee_holt
Statistics Vocab
61 Terms
debbrown1013
OTHER SETS BY THIS CREATOR
AP Statistics Chapter 7 - Sampling Distributions
19 Terms
Shane_Durkan
TEACHER
AP Statistics (Chapter 2)
21 Terms
Shane_Durkan
TEACHER
Quadratic Durkan
9 Terms
Shane_Durkan
TEACHER
Unit 1: Statistics in the World
37 Terms
Shane_Durkan
TEACHER