Home
Browse
Create
Search
Log in
Sign up
Upgrade to remove ads
Only $2.99/month
STAT100 Second Exam Notes
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (42)
Statistical Inference
-Using the sample to infer something about the population based on the above
-Reasoning rests on asking "How often would this method give a correct answer if I used it very many times?"
-The process of statistical inference involves using information from a sample to draw conclusions about a wider population.
-We need to be able to describe the sampling distribution of possible statistic values in order to perform statistical inference.
Parameter
-A number that describes the population
-In practice, the value is an unknown number
-For example:
- µ = population mean
- Sigma (o) = population standard deviation
- p = population proportion
Statistic
-Known value calculated from a sample.
-A statistic is often used to estimate a perimeter.
-For Example:
-x bar = sample mean
-s = sample standard deviation
-p hat = sample proportion
Variability
- different samples from the same population may yield different values of the sample statistic
Sampling Distribution
- tells what values a statistic takes and how often it takes those values in repeated sampling
Sampling Distribution of a Statistic
- The distribution of values taken by the statistic in all possible samples of the same size from the same population.
Law of Larger Numbers
-as the sample size increases, the sample mean gets closer to the population mean. That is , the difference between the sample mean and the population mean tends to become smaller (i.e., approaches zero).
Sampling Variability
-the value of a statistic varies in repeated random sampling.
Sampling Distribution of the Sample Mean
-We can estimate the sample mean by finding the mean of all the different samples.
-If you repeated taking simple random samples of size n from a population and calculate the sample mean for each sample, the distribution of this accumulation of sample means is the sampling distribution of the sample mean.
-The sampling distribution of the sample mean does not necessarily look like the population distribution.
Central Limit Theorem
-Regardless of the shape of the population distribution, the sampling distribution of the sample mean becomes approximately normal as the sample size n increases.
- If the random variable X (i.e., the population) is normally distributed, then the sampling distribution of the sample mean is normally distributed for any sample size.
- For all other random variables X (i.e., other populations), the sampling distribution of the sample mean is approximately normally distributed if n is 30 or higher. (The convention in our class for n large enough)
Behavior of Sampling Distribution
1) The distribution of measurements in a sample looks like the distribution in the parent population, NOT necessarily like a Normal curve.
2) The sampling distribution of the sample mean looks like a normal curve as our sample size increased, even though the parent population is definitely NOT normal.
3) As the sample size increases, the sample mean gets closer to the population mean, i.e., the difference between the sample mean and the population mean tends to become smaller (i.e., approaches zero). (Law of Large Numbers!)
4) The spread in the histograms for the sampling distribution of the sample mean is getting smaller for larger sample sizes. (Law of Large Numbers - Causing less variation in the measurement)
Properties of Sampling Distribution
- If a simple random sample of size n is drawn from any large population, then the sampling distribution of the sample mean has:
-Mean
- mu_x bar = mu
-(The mean of the sampling distribution of the sample mean equals the population mean. It is an unbiased estimator of mu)
-Standard Deviation
-Standard Error of the Mean
- σ_x bar = σ/sqrt(n)
-In addition, if the population is normally distributed, then, the sampling distribution is normally distributed.
Binomial Experiment
- Fixed number of trials, n
- Only two outcomes for each trial, success or failure
- The n trials are independent
-The probability of a success, p, is the same for each trial
Binomial Distribution
-Let x = the count of successes in a binomial setting. The distribution of X is the binomial distribution with parameters n and p. X~Binomial (n, p)
-n is the number of trials/observations
-p is the probability of a success on any one observation (p must be the same for each trial)
-The random variable X takes on whole values between 0 and n
Mean and Standard Deviation (Binomial Distribution)
-If X has the binomial distribution with n
observations and probability p of success on each observation, then the mean and standard deviation of X are:
mu = n*p
sigma = sqrt(mu * (1-p))
Normal Approximation for Binomial Distribution
-As n gets larger, something interesting happens to the shape of a binomial distribution.
-Suppose that X has the binomial distribution with n trials and success probability p. When n is large, the distribution of X is approximately Normal with mean and standard deviation.
-As a rule of thumb, we will use the Normal approximation when n is so large that np ≥ 10 and n(1 - p) ≥ 10
Sample Proportion
-p hat = count of successes in a sample/size of sample
-Also interpreted as: X/n
Sample Distribution of Sample Proportion
-As the sample size increases, the sampling distribution of p hat (Sample proportion) becomes approximately Normal.
-The mean of the sampling distribution is p
-The standard deviation of the sampling distribution is sqrt((p(1-p))/n)
Conditions for the Sampling Distribution of the Sample Proportion to be Normal
-Independence: The individual samples must be independent of one another.
-Randomization Condition: Samples were obtained by Simple Random Samples or Randomized experiments.
-10% Condition: Your sample size should not be more than 10% if the population
-Success/Failure Condition: np≥10 and n(1-p) ≥10
Simple Conditions for Inference about the Mean
1. We have a Simple Random Sample (SRS) from the
population of interest. There is no nonresponse or other practical difficulty.
2. The variable we measure has an exactly Normal
distribution N(µ,σ) in the population or sample size is 30 or more so that the Central Limit Theorem can be evoked for the Sampling Distribution. In other words, the sampling distribution of the sample mean must be Normally Distributed or approximately Normally Distributed.
3. We do not know the population mean, µ. But we do know the population standard deviation, σ.
Conditions for Inference in Practice
• Only under specific conditions can confidence interval or significance test be trusted.
- You must understand the conditions that must be met
- You must judge whether they fit your specific problem
• Inference is most reliable when the data come from a random sample or a randomized comparative experiment.
- Random samples use chance to choose respondants.
- Randomized comparative experiments use chance to
assign subjects to treatments.
• The deliberate use of chance ensures that the laws of probability apply to the outcomes, and this ensures that statistical inference makes sense.
• The data must be an SRS from the population (You should ask: "where did the data come from?").
- Different methods are needed for different designs.
- The z procedures are not correct for samples other than SRS.
• Outliers can distort the result.
- The sample mean is strongly influenced by outliers.
- Always explore your data before performing an analysis.
• The shape of the population distribution matters.
- Skewness and outliers make the z procedures untrustworthy unless the sample is large.
- In practice, the z procedures are reasonably accurate for any sample of at least moderate size from a fairly symmetric distribution.
• The population standard deviation σ must be known.
- Unfortunately σ is rarely known, so z procedures are rarely useful.
- Later we will introduce procedures for when σ is unknown.
Confidence Interval for the population mean
-Point estimate + margin of error.
- x bar +- Z_α/2 * (σ/sqrt(n))
Finding Other levels of Confidence
• In general, for a (1 - α) • 100% confidence interval, we need to find the critical value zα/2, i.e., the Z-score such that the area to the right of it is α/2:
- P(Z ≥ zα/2) = α/2 = P(Z ≤ -zα/2)
• Frequently used levels of confidence, and their related critical values, are:
- 90% corresponds to zα/2 = z0.05 = 1.645
- 95% corresponds to zα/2 = z0.025 = 1.960
- 99% corresponds to zα/2 = z0.005 = 2.575
Properties of a Confidence Interval
• A (1 - α) • 100% confidence interval will not contain the true parameter every time.
Correct Interpretation of a Confidence Interval: If you construct many (1 - α) • 100% confidence intervals, approximately (1 - α) • 100% of the intervals will cover the true parameter value.
• Whether a confidence interval contains the true mean depends solely on the sample mean.
• Both the sample size and the level of confidence affect the width of the interval.
- The larger the sample size, the narrower the interval.
- The smaller the level of confidence, the narrower the
interval.
How Confidence Intervals Behave
• The margin of error is: Z_α/2 * (σ/sqrt(n))
• The margin of error gets smaller, resulting in more accurate inference,
- when n gets larger
- when Z_α/2 gets smaller (confidence level gets smaller)
Constructing Confidence Intervals
Estimation is the process of using sample data (known) to estimate the value of a population parameter (unknown).
Estimation involves two steps:
Step 1 - Obtain the value of a statistic that estimates the value of a parameter; this is called the point estimate. (Relatively easy.)
Step 2 - Quantify the accuracy and precision of the point estimate using a confidence interval. (Requires knowledge of the sampling distribution of the statistic.)
Level of Confidence
-The level of confidence represents the expected proportion of (random) intervals that will contain the parameter if a large number of different samples is obtained.
- The level of confidence is always expressed as a
percent.
Although the choice of the level of confidence is at the discretion of the experimenter, the most commonly used values are 90%, 95%, and 99%.
● The level of confidence is associated with a number α, the "error rate".
● For "error rate" α, the level of confidence is
(1 - α) • 100%.
When α= 0.05, then (1 - α) = 0.95, and we have a 95%
level of confidence.
When α= 0.01, then (1 - α) = 0.99, and we have a 99% level of confidence.
Estimation
-the process of using sample data to estimate the value of a population parameter (Confidence Interval)
Hypothesis Testing
-Hypothesis testing is the process of using sample data to test a claim about the value of a population parameter (Tests of Significance, Hypothesis Test).
-A hypothesis test for a parameter is a procedure, based on sample evidence and probability, used to test a specific claim about the value of the parameter.
Steps of Hypothesis Testing
a) A claim is made. (Statement about the nature of some
population.)
b) Evidence (sample data from population) is collected to "test" the validity of the claim.
c) The data are analyzed to assess the plausibility of the claim. d) Conclusion about the claim is stated.
Null Hypothesis
-a statement (regarding the value of a parameter) that is believed to be true:
• It is written as H0 (read as "H-naught" or "H-sub-Oh").
• Always a statement of equality.
• It is a statement of status quo or no difference.
• Assumed to be plausible until we have evidence to the
contrary.
• The statement being tested in a statistical test is
called the null hypothesis.
• The test is designed to assess the strength of
evidence against the null hypothesis.
• Usually the null hypothesis is a statement of "no effect" or "no difference", or it is a statement of
equality
Alternative Hypothesis
-a claim (regarding the value of a parameter) to be tested:
• Is written as Ha (read as "H-a" or "H-sub-a") (sometime H1).
• It never contains a statement of equality. • It represents the claim that we seek evidence for.
• There are different types of alternative hypotheses, depending on the wording of the claim.
• The statement we are trying to find evidence for is
called the alternative hypothesis.
• Usually the alternative hypothesis is a statement of "there is an effect" or "there is a difference", or it is
a statement of inequality.
Right-tailed Test
-tests whether the parameter is either equal to, versus greater than, some value.
H0: parameter = some value
Ha: parameter > some value
Left-tailed Test
-left-tailed test tests whether the parameter is either equal to, versus less than, some value.
- H0: parameter = some value
- Ha: parameter < some value
Two-tailed Test
-two-tailed test tests whether the parameter is either equal to, versus not equal to, some value.
H0: parameter = some value
Ha: parameter ≠ some value
Outcomes of Hypothesis Testing
● If we do not have enough evidence to support the alternative hypothesis, then we will not reject the null hypothesis.
● If we have enough evidence to support the alternative hypothesis, then we will reject the null hypothesis.
Type I Error
-If we reject H0 when in fact H0 is true.
• If we decide there is a significant relationship in the population (reject the null hypothesis):
- This is an incorrect decision only if H0 is true. - The probability of this incorrect decision is equal to a. That
is, P(Type I error) = a
- In practice, typical a's are 0.01, 0.05, 0.1
• If the null hypothesis is true and a= 0.05:
- There really is no relationship and the extremity of the test statistic is due to chance.
- About 5% of all samples from this population will lead us to wrongly reject chance and conclude significance.
Type II Error
• If we fail to reject H0 when in fact Ha is true, this is a Type II error.
• If we decide not to reject chance and thus allow for the plausibility of the null hypothesis
- This is an incorrect decision only if Ha is true.
- The probability of this incorrect decision is computed as 1 minus the power of the test.
P-value
-The probability of observing a sample mean that is as extreme or more extreme than the one observed.
• The probability is calculated assuming that the null hypothesis is true.
• We use the P-value to quantify how unlikely the observed sample mean is. (It becomes the basis of our reject / do-not-reject decision.)
• What sample means are as extreme or more extreme than the observed depends on the alternative hypothesis. So, the formulas for calculating the P-value depend on the alternative hypothesis.
-The P-value can also be defined as the probability of committing a Type I error based on your sample.
-So if the p-value is large, that indicates that the probability of making a type I error is great, you will not feel comfortable rejecting the null hypothesis.
-But if the p-value is small, that indicates that the probability of making a type I error is small, you will feel comfortable rejecting the null in favor of the alternative.
-How small is small, how large is large? In STAT 250, if p-value < a, then the p-value is small enough to feel comfortable rejecting the null hypothesis. If the p-value ≥ a, then the p-value is large and you will not feel comfortable rejecting the null hypothesis.
Rejecting or Refusing to reject the null Hypothesis
-Do not reject the null hypothesis if the P-value is greater
than or equal to α.
-Reject the null hypothesis if the P-value is less than α.
P-value approach
1.State the Null and Alternative Hypothesis
2.State the significance level.
3.Calculate the test statistic, z0.
4.Calculate the P-value.
5.Determine whether you reject or not reject the null hypothesis.
6.State your conclusion in the context of the problem
Classical Approach
1.State the Null and Alternative Hypothesis
2.State the significance level.
3.Calculate the test statistic, z0.
4.Calculate the critical value, zc.
a. For a right tailed test, zc ,P (Z>zc) = a
b. For a left tailed test, zc ,P (Z< zc) = a
c. For a two tailed test, there are two critical values, -zc and zc, where P (Z< -zc) = a/2 and P (Z>zc) = a/2
5.Determine whether you reject or not reject the null hypothesis.
a. For a right tailed test, if z0 > zc, reject the null hypothesis
b. For a left tailed test, if z0 < zc, reject the null hypothesis
c. For a two tailed test, if z0 < -zc or z0 > zc , reject the null
hypothesis
6.State your conclusion in the context of the problem
YOU MIGHT ALSO LIKE...
Business Stats Exam 3-Lukasiewicz
50 terms
Chapter 7-9 QBA Multiple Choice
64 terms
Chapter 7-9 QBA Multiple Choice Combine @
64 terms
BESC chapter 6-8
87 terms
OTHER SETS BY THIS CREATOR
Produce
17 terms
Misc Codes
7 terms
Frozen Fish
11 terms
Manila Mart's Desserts
17 terms
OTHER QUIZLET SETS
Force and Motion Vocabulary
19 terms
Psych 388 ch 15
48 terms
Micro Test 2
45 terms
Age of Dinosaurs Modules 9-12
60 terms