The distribution of a count depends on
how the data are produced.
The Binomial Setting
1. There are a fixed number n of observations.
2. The n observations are all independent. That is, knowing the result of one observation does not change the probabilities we assign to other observations.
3. Each observation falls into one of just two categories, which for convenience are called "success" and "failure".
4. The probability of a success, p, is the same for each observation.
The count X of successes in the binomial setting has the binomial distribution with parameters n and p. The parameter n is the number of observations and p is the probability of a success on any one observation. The possible values of X are the whole numbers ranging from 0 to n.
Not all counts have binomial distributions.
Pay attention to the binomial setting.
Sampling Distribution of a Count
Choose an SRS of size n from a population with proportion p of successes. When the population is much larger than the sample, the count X of successes in the sample has approximately the binomial distribution with parameters n and p.
The number of ways of arranging k successes in n observations, with constant probability p of success, in an unordered sequence.
For a given number n, its factoria n! is
n! = n ⋅ (n-1) ⋅ (n-2) ... 3 ⋅ 2 ⋅ 1
And 0! = 1
If X has the binomial distribution with n observations and probability p of success on each observation, the possible values of X are 0, ,1, 2, ... n. If k is any one of these values, [image].
Binomial Mean and Standard Deviation
The center and spread of the binomial distribution for a count X are defined by mean µ and standard dev. σ:
µ = np
σ = √np(1-p)
Normal Approximation for Binomial Distributions
If n is large and p is not too close to 0 or 1, the binomial distribution can be approximated by a Normal distribution.
B(µ=np, σ=√np(1-p)) ~ N(µ=np, σ=√np(1-p))
It can generally be used when np ≥ 10 and n(1-p) ≥ 10.
This approx. can be improved w/ continuity correction.
Continuity correction can produce a more accurate Normal approximation. Counts can only take integer values, but the Normal distribution can take any real values, so the proper continuous equivalent to a count is the interval around it with size 1. This is especially helpful when the sample size is small.
A Poisson distribution describes the count X of occurrences of a defined even in fixed, finite intervals of time or space when
1. occurrences are all independent, and
2. the probability of an occurrence is the same over all possible intervals.
If X has the Poisson distribution with mean number of occurrences per interval µ, the possible values of X are 0, 1, 2... if k is any one of these values, [image]
The mean and variance of the Poisson distribution are both equal to µ, the mean number of occurrences per interval. The distribution's standard deviation σ is equal to √µ
Because the mean and variance of a Poisson distribution are equal, when the mean number of occurrences is large, the variance is also large, and the distribution looks very flat and wide.
Therefore, Poisson distributions are typically used to describe rare, random phenomena.
Remember that RL data is not perfect.
Rather, people use mathematical models to represent biological features.
Reminder: there are two types of data
Quantitative: observations that can be counted or measured across individuals in a population
Categorical: observations that fall into one of several categories
The way to set up statistical problems
- what are the n individuals/units in the sample (of size "n"?)
- what is being recorded about those n individuals/ units?
- is that a number (quantitative) or a statement (categorical)?
Binomial Distributions are models for ___?
Some categorical variables, typically representing the number of successes in a series of n independent trials.
Observations must meet these requirements
- the total number of obs.s n is fixed in advance
- each obs. falls into just 1 of 2 categories: success or failure
- the outcomes of all n obs.s are statistically independent
- all n obs.s have the same probability of "success", p.
Binomial distributions describe ___? And are used when ___?
The possible number of times that a particular event will occur in a sequence of observations. They are used when we want to know the probability of the number of times an occurrence takes place.
Parameters of a binomial distribution for X successes in n observations
n is the number of observations.
p is the probability of success on each observation.
X is the count of successes, and can be any whole number between 0 and n.