# Probability & Statistics

## 1,320 terms

### Bar graph

Displays the distribution of a categorical variable

### Binary variable

Categorical variable with 2 choices such as gender- male or female

### Categorical variable

Records a group designation such as gender

### Data

Numbers or categories recorded for the observational units in a study

### Distribution (of a variable)

Refers to it's pattern of variation. With a categorical variable, distribution means the variable's possible categories and the proportion of responses in each

### Dot plot

Useful for displaying the distribution of a relatively small data set of a quantitative variable

### Observational unit

Person or thing assigned a number or category

### Quantitative variable

Measures a numerical characteristic such as height

### Variability

Phenomenon of a variable taking on different values or categories from observational unit to observational unit.

### Variable

Any characteristic of a person or thing that can be assigned a number or category

### compound

a sequence of simple events

### counting principle

the number of possible outcomes in an experiment

### event

a subset of a sample

### experimental probability

the ratio of the number of times an outcome occurs to the total amount of trials performed

### Independent events

events for which the occurrence of one has no impact on the occurrence of the other

### Outcome

a possible result of an experiment

### Probability

a measure of the likelihood of an event

### relative frequency

the number of times an outcome occurs divided by total number of trials

### sample space

all possible outcomes of given experiment

### simple event

an event consisting of just one outcome.

### theoretical probability

The mathematical calculation that an event will happen in theory

### tree diagram

a tree-shaped diagram that illustrates sequentially the possible outcomes of a given event

### binomial distribution

a theoretical distribution of the number of successes in a finite set of independent trials with a constant probability of success

### causation

A cause and effect relationship in which one variable controls the changes in another variable.

### central limit theorem

Regardless of the population distribution, The sampling distribution is normal IF n is large enough (>30).

### cluster sampling

divide population into sections then randomly select some of those clusters and then choose ALL members from selected clusters

### confounding

a situation where the effect of one variable on the response variable cannot be separated from the effect of another variable on the response variable.

### control group

the group that does not receive the experimental treatment.

### correlation

measuring the strength and direction of the relationship between two numerical variables

### degrees of freedom

A concept used in tests of statistical significance; the number of observations that are free to vary to produce a known outcome.

### density curve

describes the overall pattern of a distribution, area = 1

### discrete random variables

A random variable that assumes countable values

### disjoint events

mutually exclusive, events that have no outcomes in common

### double blind experiments

experiments in which neither the participants nor the people analyzing the results know who is in the control group

### Empirical Rule

The rules gives the approximate % of observations w/in 1 standard deviation (68%), 2 standard deviations (95%) and 3 standard deviations (99.7%) of the mean when the histogram is well approx. by a normal curve

### explanatory variables

the treatment (ex. studying or not studying); factors

### factor

an independent variable in statistics

### five-number summary

minimum, 1st quartile, median, 3rd quartile, maximum

### geometric distribution

Success / Failure, trials continue until successful, each outcome is independent, constant probability of success

### independent events

The outcome of one event does not affect the outcome of the second event

### inference

drawing conclusions that go beyond the data at hand

### influential observations

Individual points that change the regression line. Often outliers in the x direction, but require large residuals.

### interquartile range

The difference between the upper and lower quartiles.

### law of large numbers

as an experiment is repeated over and over, the empirical probability of an event approaches the actual probability of the event

### lurking variable

A lurking variable is a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables.

### margin of error

The +- value added to and subtracted from a point estimate in order to develop an interval estimate of a population parameter

### matched pairs design

A matched pairs design is a special case of the randomized block design. It is used when the experiment has only two treatment conditions; and subjects can be grouped into pairs, based on some blocking variable. Then, within each pair, subjects are randomly assigned to different treatments.

### mean

an average of n numbers computed by adding some function of the numbers and dividing by some function of n

### median

the value below which 50% of the cases fall

### mutually exclusive

Events that cannot occur at the same time.

### normal distribution

A function that represents the distribution of variables as a symmetrical bell-shaped graph.

### p-value

measure of how rare the sample results would be if Ho were true

### parameters

numbers that describe a population

### randomization

the best defense against bias, in which each individual is given a fair, random chance of selection

### replication

the repetition of an experiment in order to test the validity of its conclusion

### residual

the difference between the observed value and the predicted value of a regression equation; y - y-hat

### response bias

people answer questions the way they think you want them answered. There are some questions they simply don't want to answer truthfully.

### sampling distribution

a distribution of statistics obtained by selecting all the possible samples of a specific size from a population

### sampling variability

the natural tendency of randomly drawn samples to differ

### scatterplots

a graphed cluster of dots, each of which represents the values of two variables. The slope of the points suggests the direction of the relationship between the two variables. The amount of scatter suggests the strength of the correlation.

### scope of inference

to whom the generalization of the inference may be directed

### simple random sample

abbreviated SRS, this requires that every item in the population has an equal chance to be chosen and that every possible combination of items has an equal chance to exist. No grouping can be involved.

conclusions drawn from two or more separate crosstabulations that can be reveresed when the data are aggregated between two quantitative variables

### single blind experiments

an experiment in which the participants are unaware of which participants received the treatment

### skewed

a distribution is this if it's not symmetric and one tail stretches out farther than the other

### slope

the average change in the response variable as the explanatory variable increases by one

how varaiable the data is; measured by standard deviation, IQR, variance, range

### standard deviation

a measure of variability that describes an average distance of every score from the mean

### standard error

the standard deviation of a sampling distribution

### standard normal curve

A normal distribution with mean of zero and standard deviation of one. Probabilities are given in Table A for values of the standard Normal variable.

### statistically significance

said to exist when the probability that the observed findings are due to chance is very low

### statistic

A numerical measurement describing some characteristic of a sample

### stratified random sample

a sample in which the population is first divided into similar, nonoverlapping groups. A simple random sample is then selected from each of the groups

### symmetric

a distribution is this if the two halves on either side of the center look approximately like mirror images of each other

### Type I Error

The error that is committed when a true null hypothesis is rejected erroneously. The probability of a Type I Error is abbreviated with the lowercase Greek letter alpha.

### Type II Error

the error of failing to reject a null hypothesis when in fact it is false (also called a "false negative"). the probability of a Type II error is commonly denoted β and depends on the effect size.

### unbiased estimator

a statistic whose sampling distribution is centered over the population parameter

### undercoverage

occurs when some groups in the population are left out of the process of choosing the sample

### variance

standard deviation squared, a measure of spread

### voluntary response

Individuals with strong feelings about a subject are more likely than others to respond. Such a study is interesting but not reflective of the population.

### y-intercept

predicted value when the x variable is zero

### z score

a measure of how many standard deviations you are away from the norm (average or mean)

If A and B are disjoint events: P(A or B)=P(A) + P(B)

### Conditional Distribution

the distribution of a variable restricting the who to consider only a smaller group of individuals

### Conditional Probability

A probability that takes into avvount a given condition.

### Disjoint Events

mutually exclusive, events that have no outcomes in common

For any two events (meaning disjoint or not disjoint), A and B, the probability of A or B is:
P(A ∪ B) = P(A) + P(B) - P(A ∩ B).

### General Multiplication Rule

If A and B are any two events, then
P(A & B) = P(A) x P(B|A)

### Independence (Casually)

Two events are indpendent if knowing whether one event occurs does not alter the probability that the other event occurs

### Independence (Formally)

P(BlA) = P(B) when A and B are independent.

### Sample Space

The collection of all possible outcomes

### Tree Diagram

a diagram used to show the total number of possible outcomes in a probability experiment

-Mean, Median, and Mode are affected
-Cannot subtract SD; Only square, add, and square root
-Range is NOT affected

### Continuous Random Variable

-Random variable that assumes values associated with one or more intervals on the number line

### Discrete Random Variable

-Random variable with a countable number of outcomes

### Event

-Subset of the sample space

### Independent Events

-If the knowledge of one event having occurred does not change the probability that the other event occurs

### Law of Large Numbers

-States that the proportion of successes in the simulation should become, over time, close to the true proportion in population

### Multiplication and Division

-Mean, Median, Mode, Range, and SD are affected

### Mutually Exclusive Event

-If they have no outcomes in common
-One cannot happen with the other

### Probability Distribution for a Discrete Random Variable

-Possible values of the discrete random variable together with their respective probabilities

### Probability Distribution for a Random Variable

-Possible values of the random variable X together with the probabilities corresponding to those values

### Random Phenomenon

-An activity whose outcome we can observe or measure but we do not know how it will turn out on any single trial

### complementary

two events that cannot occur together, but one must happen

### dependent

events do impact each other's probability

### Independent

events do not impact each other's probability

### Mutually exclusive/disjoint

two events that cannot occur simultaneously

### Bayes's Theorem

Suppose that A₁, A₂, ... Ak are disjoint events whose probabilities are not 0 and add to exactly 1, i.e. any outcome must be exactly one of those events. The, if B is any other event whose probability is not 0 or 1,
P(Ai|B) = P(B|Ai)P(Ai) / P(B|A₁)P(A₁) + ... P(B|Ak)P(Ak)

### Conditional Probability

The probability of some event given that some other event occurs.

### Disjoint ≠ Independent

If two events are disjoint, then the occurrence of one would mean the non-occurrence of the other. If events are independent, then non/occurrence is moot.

### General Addition Rule for Any Two Events

For any two events A and B,
P(A or B) = P(A) + P(B) - P(A and B)

### General Multiplication Rule for Any Two Events

The probability that both of two events A and B happen together can be found by
P(A and B) = P(A)P(B|A)

### Independence Definition

When the outcome of one event cannot influence the outcome of a second event.

### Outcomes for a diagnostic test

There are four possible outcomes:
- true positive
- true negative
- false positive
- false negative

### Positive Predictive Value

PPV = # of true positives / total # positives

### Prevalence

# of diseased individuals / total # of individuals

### Sensitivity

P(positive|diseased)
Want to be as high as possible, to diagnose.

### Specificity

P(negative|non-diseased)
Want to be as high as possible to avoid false positives.

### The Multiplication Rule for Independent Events

P(A and B) = P(A)P(B)

### Tree Diagrams

Diagrams that will show P(A) as independent branches, then P(B|A) as branches coming off those branches, etc. until a final event is reached. The probability of any one event occurring can be calculated by multiplying the probabilities of each branch along the way.

### Venn Diagram

A diagram showing a sample space S and events as areas within S. Overlaps indicate non-disjoint events.

### When P(A)>0, the conditional probability of event B occurring given A occurs is

P(B|A) = P(A and B) / P(A)

1 - P(E)

### "None"

P(E)

When two events A and B are mutually exclusive, the probability that A or B will occur is
P(A or B) = P(A) + P(B)

If A and B are NOT mutually exclusive, then
P(A or B) = P(A) + P(B) - P(A and B)

### Classical Probability

P(E) = # of ways the trial can occur
total # of outcomes
Whenever you are finding probability where the sample space is the same.

### Combinations Rule

Used when selecting a smaller number from a larger number but the order is NOT important.
nCr= n!
r! (n-r)!
n=sample size, r=smaller objects selecting
On calculator: enter amount(n), math, PRB, 3, enter amount(r), enter

### Complement Rule

P(E)
Is the set of outcomes in the sample space that are not included in the outcomes of E

### Conditional Probability

The probability that the second event B occurs given that the first event A has occurred can be found by dividing the probability that both events occurred by the probability that the first event has occurred. The formula is
P(B!A) = P(A and B)
P(A)

### Dependent Events

When the outcome or occurrence of the first event affects the outcome or occurrence of the second event in such a way that the probability is changed.
*without replacement = dependent events

### Empirical Probability

P(E) = frequency for the class = f
total frequencies in the distribution n
Relies on actual experience to determine the likelihood of outcomes.

### Factorial Rule

Use this when you have "n" objects and you want to know how many different ways they can be arranged.
n!
On calculator: enter amount, math, arrow left to PRB, 4, enter

### Fundamental Counting Rule

Use this when you have different positions and you want to know how many options there are within those positions.
___________________________= 2 =512

### Independent Events

Two events A and B are independent events if the fact that A occurs does NOT affect the probability of B occurring.
*with replacement = independent events

### Multiple Combinations Rule

When you are taking more than one combination in a problem.
nCr * nCr

### Multiplication Rule 1

When two events are independent, the probability of both occurring is
P(A and B) = P(A) * P(B)

### Multiplication Rule 2

When two events are dependent, the probability of both occurring is
P(A and B) = P(A) * P(B!A)

### Mutually Exclusive Events

Two events that cannot occur at the same time (i.e., they have no outcomes in common).

### Permutations Rule

Used when selecting a smaller group from a larger group and you put them in a specific order.
*ORDER IS IMPORTANT
nPr= n!
(n-r)!
n=sample size, r=smaller objects selecting
On calculator: enter amount(n), math, PRB, 2, enter amount(r), enter

### Subjective Probability

Uses a probability value based on an educated guess or estimate, employing opinions and inexact information.

### These are counting Rules.....

NOT trying to find probability!

### bar graph

quickly compares data in column form, the heights can also show percents

### box plots

made based off of the 5 number summary
modified - shows outliers

### cases

When the objects are people in a set of data

### catagorical variable

an individual into one of two or more groups or categories

### density curve

the overall pattern of a distribution, areas underneath give proportions of observations for the distribution

### distribution

a variable tells us what values it takes and how often it takes these values
of categorical - gives us either the count of the percent of individuals that fall in each category

### examining terms for distribution

overall pattern, deviations, shape, center, spread, outlier

See More

Example: