# AP Stats

 DataSystematically recorded information
Population All cases we wish we knew about
Sample All cases we actually examine
Categorical Variable All variables that name categories
Quantitative Variable A variable in which the numbers act as numberical values
Frequency Table A table that lists categories i a categorical variable and gives counts of observations for each category
Area Principle Each data value should be represented by same amount of area
Marginal Distribution In a contingency table, the distribution of either variable alone
Contingency Table Displays counts (%) of individuals falling into categories on two or more variables
Simpson's Paradox When averages are taken across different groups, they can appear to contradict the overall average
Histogram Used adjacent bars to show distribution of quantitative variable
Stem and Leaf Display Shows quantitative data values in a way that sketches the distributions of the data
Dot Plot Graphs a dot for each case against a single axis
Mode Hump or local high point in shape of a distribution
Unimodal Having one mode (bump)
Bimodal Having two modes (bumps)
Multimodal Having more than two modes (bumps)
Uniform A distribution that is roughly flat
Symmetric A distribution where two sides are mirror images of each other
Skewed When a grpah is not symmetric, you look at which tail stretches out further.
Outliers Any value that is more than 1.5 (IQR) from either end of a box plot. A point that does not fit
Far Outlier Any value that is more than 3 (IQR) from either end of a box plot
Median Middle Value. Usually used when data is skewed
Range High - Low
1st Quartile 25% of data lies below
3rd Quartile 75% of data lies below
Interquartile Range (IQR) Q3 - Q1
5 Number Summary Same as a box and whisker plot (min, max, Q1, Q3, Median
Mean A measure of center used when data are symmetric
Standard Deviation Measure of spread - tells the average distance each value is away from the mean
Time Plot Dsiplays data that change over time
Parameter A population measure
Statistics A sample measure that is calculated from a set of data
Z-Score Tells how many standard deviations a value is away from the means. They do not have units, so we can compare two different values
65-95-99.7 Rule Says 68% fall within 1 Standard Deviation, 95% of data fall within 2 standard deviations, 99.7% of data fall within 3 standard deviations
Scatterplots Shows relationship between two quantitative variables
Response Variable Variabvles that you hope to predict or explain (Y)
Explanatory Variable Variables that you can use to account for, explain or predict the y variable (X)
Correlation Coefficient A number that measures direction and strength of a linear association
Lurking Variable A variable other than x and y that affects both variables accounting for hte correlation between the two
Residual Differences between data values and corresponding predicted values
Slope B1 gives a vlue in y units per x
Intercept B0 is the value of y when x is zero
Se Standard Deviation of residuals
R(squared) Overall measure of how successful the regression is in linearly relating y to x
Extrapolation When we use regression to predict the future
Leverage Data points that pull the line close to them so they can have a great effect on slope and intercept
Influential Point A point that when omitted will give very different results
Ladder of Powers Places an order to any re-expressoin we must do
Bias Any systematic failure of a sampling method to represent its population
Census A sample that consists of the entire population
Simple Random Sample (SRS) A sample size where each element has an equal chance of being selected
Stratified Random Sample When population is divided into several sub-populations
Cluster Sample Where entire groups are chosen at random
Multistage Sample When we combing several sampling methods
Systematic Sample A sample drawn by taking every n^th person
Pilot Small trial run of a survey to see if questions are clear
Convenience sample When a sample is created by subjects that are convenient
Undercoverage When part of a population is represented less than another
Voluntary Response Bias When we receive biased results due to the responses received
Non-response bias When we receive biased results due to the resonses not received
Response Bias Anything in a survey design that influences responses
Observational Study A study based on data where no manipulation has been employed
Retrospective Study An observational study where subjects are selected and their previous conditions are determined
Prospective Study An observational study where subjects are followed to observe future outcomes
Experiment An experiment that manipulates factor levels to create treatments
Factor A variable whose levels are manipulated by the experimentor
Statistically Significant When an observed difference is too large for us to believe that it is likely to have occurred naturally
Control Group Our baseline in an experiment
Blinding Any individual in an experiment that is unaware of how subjects have been treated
Single-Blind When one (subject or evaluators) is blinded
Double-Blind When subject and evaluator is blind
Placebo A treatment that has no effect
Placebo Effect Tendency of humans to show a response even when they have been given a placebo
Blocking When groups of experimental units are similar
Matching In a retrospective or prospective study, subjects who are similar in ways not under study may be matched and compared to each other
Confounding When the levels of one factor are associated with the levels of another factor in such a way that their effects cannot be seperated.
Trial A single attempt or realization of a random phenomenon
Outcome Value that is measure or reported from a trial
Sample Space Collection of all possible outcomes
Law of Large Numbers The long run probability will approach its theoretical probability
Independence When the probability of one even occurring has no affect on the probability of a second event occurring
Probability A number between 0 and 1 that describes the chance of an event occurring
Disjoint When two events share no common outcomes
Mutually Exclusive When two events share no common outcomes
Expected Value The theoretical long-run average value. AKA The Mean
Bernoulli Trials Have 2 outcomes, probability of a success is constant, trials are independent
Geometric probability model Counts the Number of Bernoulli trials until the first success
Binomial Probability Model Counts the number of successes in n trials
Central Limit Theorem The sampling distribution model of the sample mean and proportion from a random sample is approximately normal for large n regardless of the distribuion of the population as long as observations are independent
Standard Error When we estimate the standard deviation of a sampling distribution using statistics found from the data
Confidence Interval An inverval used for estimating a parameter
Margin of Error Tells you the "give or take" from a confidence interval
Alpha Level The "threshold" p value that determines whether we reject a null hypothesis
Statistically Significant When the p value falls below the alpha level
Significance Level Another name for alpha level
Type I Error Rejecting a null hypothesis when it is true (also called a false positive) usually denoted by a fancy a
Type II Error Failing to rejet a null hypothesis when it is false. (also called a false positive) usually denoted by a special B
Power The probability that a hypothesis test will correctly reject a false null

