Probability & Statistics

1320 terms by jlnovotny

Create a new folder

Advertisement Upgrade to remove ads

Bar graph

Displays the distribution of a categorical variable

Binary variable

Categorical variable with 2 choices such as gender- male or female

Categorical variable

Records a group designation such as gender

Data

Numbers or categories recorded for the observational units in a study

Distribution (of a variable)

Refers to it's pattern of variation. With a categorical variable, distribution means the variable's possible categories and the proportion of responses in each

Dot plot

Useful for displaying the distribution of a relatively small data set of a quantitative variable

Observational unit

Person or thing assigned a number or category

Quantitative variable

Measures a numerical characteristic such as height

Variability

Phenomenon of a variable taking on different values or categories from observational unit to observational unit.

Variable

Any characteristic of a person or thing that can be assigned a number or category

compound

a sequence of simple events

counting principle

the number of possible outcomes in an experiment

event

a subset of a sample

experimental probability

the ratio of the number of times an outcome occurs to the total amount of trials performed

Independent events

events for which the occurrence of one has no impact on the occurrence of the other

Outcome

a possible result of an experiment

Probability

a measure of the likelihood of an event

relative frequency

the number of times an outcome occurs divided by total number of trials

sample space

all possible outcomes of given experiment

simple event

an event consisting of just one outcome.

theoretical probability

The mathematical calculation that an event will happen in theory

tree diagram

a tree-shaped diagram that illustrates sequentially the possible outcomes of a given event

binomial distribution

a theoretical distribution of the number of successes in a finite set of independent trials with a constant probability of success

causation

A cause and effect relationship in which one variable controls the changes in another variable.

central limit theorem

Regardless of the population distribution, The sampling distribution is normal IF n is large enough (>30).

cluster sampling

divide population into sections then randomly select some of those clusters and then choose ALL members from selected clusters

confounding

a situation where the effect of one variable on the response variable cannot be separated from the effect of another variable on the response variable.

control group

the group that does not receive the experimental treatment.

correlation

measuring the strength and direction of the relationship between two numerical variables

degrees of freedom

A concept used in tests of statistical significance; the number of observations that are free to vary to produce a known outcome.

density curve

describes the overall pattern of a distribution, area = 1

discrete random variables

A random variable that assumes countable values

disjoint events

mutually exclusive, events that have no outcomes in common

double blind experiments

experiments in which neither the participants nor the people analyzing the results know who is in the control group

Empirical Rule

The rules gives the approximate % of observations w/in 1 standard deviation (68%), 2 standard deviations (95%) and 3 standard deviations (99.7%) of the mean when the histogram is well approx. by a normal curve

explanatory variables

the treatment (ex. studying or not studying); factors

factor

an independent variable in statistics

five-number summary

minimum, 1st quartile, median, 3rd quartile, maximum

geometric distribution

Success / Failure, trials continue until successful, each outcome is independent, constant probability of success

independent events

The outcome of one event does not affect the outcome of the second event

inference

drawing conclusions that go beyond the data at hand

influential observations

Individual points that change the regression line. Often outliers in the x direction, but require large residuals.

interquartile range

The difference between the upper and lower quartiles.

law of large numbers

as an experiment is repeated over and over, the empirical probability of an event approaches the actual probability of the event

lurking variable

A lurking variable is a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables.

margin of error

The +- value added to and subtracted from a point estimate in order to develop an interval estimate of a population parameter

matched pairs design

A matched pairs design is a special case of the randomized block design. It is used when the experiment has only two treatment conditions; and subjects can be grouped into pairs, based on some blocking variable. Then, within each pair, subjects are randomly assigned to different treatments.

mean

an average of n numbers computed by adding some function of the numbers and dividing by some function of n

median

the value below which 50% of the cases fall

mutually exclusive

Events that cannot occur at the same time.

normal distribution

A function that represents the distribution of variables as a symmetrical bell-shaped graph.

p-value

measure of how rare the sample results would be if Ho were true

parameters

numbers that describe a population

randomization

the best defense against bias, in which each individual is given a fair, random chance of selection

replication

the repetition of an experiment in order to test the validity of its conclusion

residual

the difference between the observed value and the predicted value of a regression equation; y - y-hat

response bias

people answer questions the way they think you want them answered. There are some questions they simply don't want to answer truthfully.

sampling distribution

a distribution of statistics obtained by selecting all the possible samples of a specific size from a population

sampling variability

the natural tendency of randomly drawn samples to differ

scatterplots

a graphed cluster of dots, each of which represents the values of two variables. The slope of the points suggests the direction of the relationship between the two variables. The amount of scatter suggests the strength of the correlation.

scope of inference

to whom the generalization of the inference may be directed

simple random sample

abbreviated SRS, this requires that every item in the population has an equal chance to be chosen and that every possible combination of items has an equal chance to exist. No grouping can be involved.

simpson's paradox

conclusions drawn from two or more separate crosstabulations that can be reveresed when the data are aggregated between two quantitative variables

single blind experiments

an experiment in which the participants are unaware of which participants received the treatment

skewed

a distribution is this if it's not symmetric and one tail stretches out farther than the other

slope

the average change in the response variable as the explanatory variable increases by one

spread

how varaiable the data is; measured by standard deviation, IQR, variance, range

standard deviation

a measure of variability that describes an average distance of every score from the mean

standard error

the standard deviation of a sampling distribution

standard normal curve

A normal distribution with mean of zero and standard deviation of one. Probabilities are given in Table A for values of the standard Normal variable.

statistically significance

said to exist when the probability that the observed findings are due to chance is very low

statistic

A numerical measurement describing some characteristic of a sample

stratified random sample

a sample in which the population is first divided into similar, nonoverlapping groups. A simple random sample is then selected from each of the groups

symmetric

a distribution is this if the two halves on either side of the center look approximately like mirror images of each other

Type I Error

The error that is committed when a true null hypothesis is rejected erroneously. The probability of a Type I Error is abbreviated with the lowercase Greek letter alpha.

Type II Error

the error of failing to reject a null hypothesis when in fact it is false (also called a "false negative"). the probability of a Type II error is commonly denoted β and depends on the effect size.

unbiased estimator

a statistic whose sampling distribution is centered over the population parameter

undercoverage

occurs when some groups in the population are left out of the process of choosing the sample

variance

standard deviation squared, a measure of spread

voluntary response

Individuals with strong feelings about a subject are more likely than others to respond. Such a study is interesting but not reflective of the population.

y-intercept

predicted value when the x variable is zero

z score

a measure of how many standard deviations you are away from the norm (average or mean)

Addition Rule

If A and B are disjoint events: P(A or B)=P(A) + P(B)

Conditional Distribution

the distribution of a variable restricting the who to consider only a smaller group of individuals

Conditional Probability

A probability that takes into avvount a given condition.

Disjoint Events

mutually exclusive, events that have no outcomes in common

General Addition Rule

For any two events (meaning disjoint or not disjoint), A and B, the probability of A or B is:
P(A ∪ B) = P(A) + P(B) - P(A ∩ B).

General Multiplication Rule

If A and B are any two events, then
P(A & B) = P(A) x P(B|A)

Independence (Casually)

Two events are indpendent if knowing whether one event occurs does not alter the probability that the other event occurs

Independence (Formally)

P(BlA) = P(B) when A and B are independent.

Sample Space

The collection of all possible outcomes

Tree Diagram

a diagram used to show the total number of possible outcomes in a probability experiment

Addition and Subtraction

-Mean, Median, and Mode are affected
-Cannot subtract SD; Only square, add, and square root
-Range is NOT affected

Continuous Random Variable

-Random variable that assumes values associated with one or more intervals on the number line

Discrete Random Variable

-Random variable with a countable number of outcomes

Event

-Subset of the sample space

Independent Events

-If the knowledge of one event having occurred does not change the probability that the other event occurs

Law of Large Numbers

-States that the proportion of successes in the simulation should become, over time, close to the true proportion in population

Multiplication and Division

-Mean, Median, Mode, Range, and SD are affected

Mutually Exclusive Event

-If they have no outcomes in common
-One cannot happen with the other

Probability Distribution for a Discrete Random Variable

-Possible values of the discrete random variable together with their respective probabilities

Probability Distribution for a Random Variable

-Possible values of the random variable X together with the probabilities corresponding to those values

Random Phenomenon

-An activity whose outcome we can observe or measure but we do not know how it will turn out on any single trial

complementary

two events that cannot occur together, but one must happen

dependent

events do impact each other's probability

Independent

events do not impact each other's probability

Mutually exclusive/disjoint

two events that cannot occur simultaneously

Bayes's Theorem

Suppose that A₁, A₂, ... Ak are disjoint events whose probabilities are not 0 and add to exactly 1, i.e. any outcome must be exactly one of those events. The, if B is any other event whose probability is not 0 or 1,
P(Ai|B) = P(B|Ai)P(Ai) / P(B|A₁)P(A₁) + ... P(B|Ak)P(Ak)

Conditional Probability

The probability of some event given that some other event occurs.

Disjoint ≠ Independent

If two events are disjoint, then the occurrence of one would mean the non-occurrence of the other. If events are independent, then non/occurrence is moot.

General Addition Rule for Any Two Events

For any two events A and B,
P(A or B) = P(A) + P(B) - P(A and B)

General Multiplication Rule for Any Two Events

The probability that both of two events A and B happen together can be found by
P(A and B) = P(A)P(B|A)

Independence Definition

When the outcome of one event cannot influence the outcome of a second event.

Outcomes for a diagnostic test

There are four possible outcomes:
- true positive
- true negative
- false positive
- false negative

Positive Predictive Value

PPV = # of true positives / total # positives

Prevalence

# of diseased individuals / total # of individuals

Sensitivity

P(positive|diseased)
Want to be as high as possible, to diagnose.

Specificity

P(negative|non-diseased)
Want to be as high as possible to avoid false positives.

The Multiplication Rule for Independent Events

P(A and B) = P(A)P(B)

Tree Diagrams

Diagrams that will show P(A) as independent branches, then P(B|A) as branches coming off those branches, etc. until a final event is reached. The probability of any one event occurring can be calculated by multiplying the probabilities of each branch along the way.

Venn Diagram

A diagram showing a sample space S and events as areas within S. Overlaps indicate non-disjoint events.

When P(A)>0, the conditional probability of event B occurring given A occurs is

P(B|A) = P(A and B) / P(A)

"At least one"

1 - P(E)

"None"

P(E)

Addition Rule 1

When two events A and B are mutually exclusive, the probability that A or B will occur is
P(A or B) = P(A) + P(B)

Addition Rule 2

If A and B are NOT mutually exclusive, then
P(A or B) = P(A) + P(B) - P(A and B)

Classical Probability

P(E) = # of ways the trial can occur
total # of outcomes
Whenever you are finding probability where the sample space is the same.

Combinations Rule

Used when selecting a smaller number from a larger number but the order is NOT important.
nCr= n!
r! (n-r)!
n=sample size, r=smaller objects selecting
On calculator: enter amount(n), math, PRB, 3, enter amount(r), enter

Complement Rule

P(E)
Is the set of outcomes in the sample space that are not included in the outcomes of E

Conditional Probability

The probability that the second event B occurs given that the first event A has occurred can be found by dividing the probability that both events occurred by the probability that the first event has occurred. The formula is
P(B!A) = P(A and B)
P(A)

Dependent Events

When the outcome or occurrence of the first event affects the outcome or occurrence of the second event in such a way that the probability is changed.
*without replacement = dependent events

Empirical Probability

P(E) = frequency for the class = f
total frequencies in the distribution n
Relies on actual experience to determine the likelihood of outcomes.

Factorial Rule

Use this when you have "n" objects and you want to know how many different ways they can be arranged.
n!
On calculator: enter amount, math, arrow left to PRB, 4, enter

Fundamental Counting Rule

Use this when you have different positions and you want to know how many options there are within those positions.
___________________________= 2 =512

Independent Events

Two events A and B are independent events if the fact that A occurs does NOT affect the probability of B occurring.
*with replacement = independent events

Multiple Combinations Rule

When you are taking more than one combination in a problem.
nCr * nCr

Multiplication Rule 1

When two events are independent, the probability of both occurring is
P(A and B) = P(A) * P(B)

Multiplication Rule 2

When two events are dependent, the probability of both occurring is
P(A and B) = P(A) * P(B!A)

Mutually Exclusive Events

Two events that cannot occur at the same time (i.e., they have no outcomes in common).

Permutations Rule

Used when selecting a smaller group from a larger group and you put them in a specific order.
*ORDER IS IMPORTANT
nPr= n!
(n-r)!
n=sample size, r=smaller objects selecting
On calculator: enter amount(n), math, PRB, 2, enter amount(r), enter

Subjective Probability

Uses a probability value based on an educated guess or estimate, employing opinions and inexact information.

These are counting Rules.....

NOT trying to find probability!

bar graph

quickly compares data in column form, the heights can also show percents

box plots

made based off of the 5 number summary
modified - shows outliers

cases

When the objects are people in a set of data

catagorical variable

an individual into one of two or more groups or categories

density curve

the overall pattern of a distribution, areas underneath give proportions of observations for the distribution

distribution

a variable tells us what values it takes and how often it takes these values
of categorical - gives us either the count of the percent of individuals that fall in each category

examining terms for distribution

overall pattern, deviations, shape, center, spread, outlier