Upgrade to remove ads
Stat final exam and make-up review
Terms in this set (148)
Most analysts focus on the cost of tuition as the way to measure the cost of a college education. But
incidentals, such as textbook costs, are rarely considered. A researcher at the University of Oklahoma
wishes to estimate the textbook costs of first-year students at O.U. To do so, she monitored the textbook
cost of 250 first-year students and found that their average textbook cost was $600 per semester. Identity
the population of interest to the research.
All first-year University of Oklahoma Students
The evening host of a dinner reached into a bowl, mixed all tickets around, and selected the ticket to award
the grand door prize. What kind of sample will be generated?
Simple random sample
A telemarketer set the company's computerized dialing system to contact every 25th person listed in the
local telephone directory. What kind of sample will be generated?
The Dean of Students mailed a survey to a total of 400 students. The sample included 100 students
randomly selected from each of the freshman, sophomore, junior, and classes on campus last term. What
kind of sample will be generated?
The width of each bar in a histogram corresponds to the
Differences between the boundaries of the class
In a perfectly symmetrical bell-shaped "normal" distribution
The arithmetic mean equals the median, The median equals the mode, and The arithmetic mean equals the mode
. In right-skewed distributions, which of the following is the correct statement?
The distance from Q1 to Q2 is less than the distance from Q2 to Q3
According to the empirical rule, if the data has a "bell-shaped" normal distribution, about _____________
percent of the observations will be contained within 2 standard deviations around the arithmetic mean.
Which of the following is NOT a measure of central tendency? (arithmetic mean, geometric mean, mode, interquartile range)
The interquartile range
You were told that the 1st, 2nd, 3rd quartiles of female students' weight at the University of Oklahoma 95
lbs, 125 lbs, and 138 lbs. What is the percentage of students who weigh more than 138 lbs?
The rate of return for a stock over three year period is 0.527, 0.145, and 0.684. Which of the following
measures is the best measure of central tendency for these rates? (The arithmetic mean of return, the median return, the geometric mean rate of return, the second quartile of return)
The geometric mean rate of return
Which of the following descriptive measures can be used to identify the outliers in a data set?
The Z-score for each observation
Let's play a game. You may win $200 with the probability of about 33%, and you may loose $100 with
the probability of about 66%. What is the expected value of this game?
The Central Limit Theorem implies that:
Regardless of the population distribution, the sampling distribution of the mean is
approximately normal when the sample size is large enough.
The Standardized Normal Distribution
is bell-shaped and symmetric, with its mean being equal to zero (0) and its standard deviation
being equal to one (1).
Let's assume that B1 and B2 are mutually exclusive and collectively exhaustive events. Also assume
that the joint probability of A and B1 and the joint probability of A and B2 are non-zero. Given these
assumptions, identify that wrong statement:
The probability of an event like A can be written as the product of the conditional probability
of A given B1 and the conditional probability of A given B2
Using ________________, one may make use of new information to update a conditional probability.
The historical data on the number of times that a given team in Major League Baseball has clinched its
division (i.e., has made it to the next round of the games) is available to almost everyone. You are
asked to report the probability that a given team clinches its division at least 3 times in the next 5
seasons. What kind of Probability Distribution Function would you use for this purpose?
Binomial Probability Distribution Function
The marketing department of a middle-sized manufacturer has 45 employees. 20 of them are female
and 25 of them are male. A group of 5 employees are randomly chosen to travel and meet with regional
sales departments. You are asked to compute the probability that at least one (1) female employee is
chosen for this committee. What kind of Probability Distribution Function would you use for this
Hypergeometric Probability Distribution Function
The historical data on the number of electric network outages per month are available for a local utility
provider. You are asked to compute the probability that more than one (1) outage occurs each month.
What kind of Probability Distribution Function would you use for this purpose?
Poisson Probability Distribution Function
X is a continuous random variable (e.g. time required to download a music file), which is normally
distributed with the mean μ and standard deviation σ. Probability of X being less than XL is equal to PL.
Probability of X being more than XU is equal to PU. Find P(XL ≤ X ≤ XU):
P(XL ≤ X ≤ XU) = 1 - (PU + PL)
Let Y be a discrete random variable with a Poisson distribution. And let X be a continuous random
variable with a Normal distribution. Also, let C be a constant. Identify the correct statement.
P(X=C) is always equal to zero
The mean of the Sampling Distribution of the Means is an unbiased estimator of the population
when the sample size is large enough
Identify the correct statement:
***A.We make use of sample statistics (e.g. the sample mean) to estimate population parameters
(e.g. the population mean)***
b. We make use of population parameters (e.g. population mean) to estimate sample statistics
(e.g. the sample mean)
c. The sampling distribution of the mean always follow the population distribution
d. The sampling distribution of the mean is normal only when the population is normal
A and B are two independent events. P(A|B) is equal to:
X is a continuous random variable, which is distributed normally. From the X continuum, we choose a
given value, called X
. The Z-value of X
is equal to Z
. Probability of Z<=Z
is equal to π*. What is
the probability of X>X*?
) = 1 - π
X is a continuous random variable. Z is the Z-value associated with the observations in X. We can say
that X is normally distributed when:
X is a linear function of Z
A population of interest is not distributed normally. A group of researchers repeatedly choose a number
of random samples from this population. As they choose more samples, they increase the sample sizes.
Complete the following statement:
As sample sizes increase, the standard deviation of the Sampling Distribution of the Means_____
X is a continuous normal variable with a Normal Probability Distribution. Z is the Z-values associated
with X. The Cumulative Standardized Normal Distribution table/function includes:
DEFINE the variables that you want to study in order to solve a problem or meet an objective
COLLECT the data for those variables from appropriate sources
ORGANIZE the data collected by developing tables
VISUALIZE the data collected by developing charts
ANALYZE the data collected to reach conclusions and present those results
Have values that can only be placed into categories such as yes and no
Have values that represent a counted or measured quantity
have numerical values that arise from a counting process.
EX: Number of items purchased
have numerical values that arise from a measuring process.
EX: Time spent waiting in a checkout line
classifies data into distinct categories in which no ranking is implied
classifies values into distinct categories in which ranking is implied
an ordered scale in which the difference between measurements is a meaningful quantity but does not involve a true zero point
ordered scale in which the difference between the measurements involves a true zero point, as in height, age, or salary measurements.
Primary Data Source
Collect your own data
Secondary data source
Someone else collected the data you are using
Consists of all items or individuals about which you want to reach conclusions
portion of the population selected for analysis
data that follows some organizing principle or plan, typically a repeating pattern
follows no repeating pattern
The category definitions cause each data value to be placed in one and only one category
The set of categories you create for the new, recoded variables include all the data values being recoded
Simple random sample
Every item from a frame has the same chance of selection as every other item, and every sample of a fixed size has the same chance of selection as every other sample of that size.
You partition the N items in the frame into n groups of k items, where k=N/n
Round k to the nearest integer. To select a systematic sample, you choose the first item to be selected at random from the first k items in the frame. Then, you select the remaining n-1 items by taking every kth item thereafter from the entire frame.
You first subdivide the N items in the frame into separate subpopulations, or strata. A stratum is defined by some common characteristic, such as gender or year in school. You select a simple random sample within each f the strata and combine the results from the separate simple random samples
divide the N items in the frame into clusters that contain several items. Clusters are often naturally occurring groups, such as counties. You then take a random sample of one or more clusters and study all items in each selected cluster
Tallies the values as frequencies or percentages for each category. Helps you see the differences among the categories by displaying the frequency, amount, or percentage of items in a set of categories in a separate column
cross-tabulates, or tallies jointly , the values of two or more categorical variables, allowing you to study patterns that may exist between the variables. Tallies can be shown as a frequency, percentage of the overall total, percentage of the row total, or a percentage of the column total, depending on the type of contingency table you use. Each tally appears in its own cell, and there is a cell for each joint response.
Tallies the values of a numerical variable into a set of numerically ordered classes. Each class groups a mutually exclusive range of values, called a class interval. Each value can be assigned to only one class, and every value must be contained in one of the class intervals.
What are class intervals identified by?
Their class midpoints
Relative frequency distribution
presents the relative frequency, or proportion, of the total for each group that each class represents
presents the percentage of the total for each group that each class represents
Proportion (or relative frequency)
is equal to the number of values in each class divided by the total number of values.
Cumulative percentage distribution
provides a way of presenting information about the percentage of values that are less than a specific amount. You use a percentage distribution as the basis to construct a cumulative percentage distribution.
visualizes a categorical variable as a series of bars, with each bar representing the tallies for a single category. The length of each bar represents either the frequency or percentage of values for a category and each bar is separated by a gap
The tallies for each category are plotted as vertical bars in descending order, according to their frequencies, and are combined with a cumulative percentage line on the same chart. They get their name from the pareto principle, the observation that in many data sets, a few categories of a categorical variable represent the majority of the data, while many other categories represent a relatively small, or trivial, amount of data. They help you identify the "vital few" categories from the trivial many so that you can focus on the important categories.
Side-by-side bar chart
uses sets of bars to show the joint responses from 2 categorical variables
visualizes data as a vertical bar chart in which each bar represents a class interval from a frequency or percentage distribution.
Used when using a categorical variable to divide the data of a numerical variable into 2 or more groups. This chart uses the midpoints of each class interval to represent the data of each class and then plots the midpoints, at their respective class percentages, as points on a line along the X axis
Cumulative percentage polygon (ogive)
uses the cumulative percentage distribution to plot the cumulative percentages along the Y axis. Unlike the percentage polygon, the lower boundary of the class interval for the numerical variable are plotted, at their respective class percentages, as points on a line along the X axis
Multidimensional contingency table
used to tally the responses of 3 or more categorical variables.
A variable that is affecting the results of the other variables
the extent to which the values of a numerical variable group around a typical, or central, value.
measures the amount of dispersion, or scattering, away from a central value that the values of a numerical variable show. The shape of a variable is the pattern of the distribution of values from the lowest value to the highest value
typically referred to as the mean, is the most common measure of central tendency.
Middle value in an ordered array of data that has been ranked from smallest to largest.
If you have an even amount of numbers, average 2 middle values.
Used when you want to measure the rate of change over time
=the nth root of the product of n values
Variance and Standard deviation
2 commonly used measures of variation that account for how all the values are distributed
How to hand compute sample variance
1. Compute the difference between each value and the mean
2. square each difference
3. sum the squared differences
4. divide this total by n-1 to compute sample variance
5. take the square root of the sample variance to compute sample standard deviation
Coefficient of Variation
measures the scatter in the data relative to the mean.
the difference between that value and the mean, divided by the standard deviation. A z score of 0 indicates that the value is the same as the mean. If it is a positive or negative number, it indicates whether value is above or below the mean and by how many standard deviations. Helps identify outliers.
Most values are in upper portion
Most values are in the lower portion
negative, or left-skewed distribution
Symmetrical distribution with 0 skewness
Positive, or right-skewed distribution
Measures the extent to which values that are very different from the mean affect the shape of the distribution of a set of data. It affects the peakedness of the curve of the distribution- that is, how sharply the curve rises approaching the center of the distribution.
a kurtosis value that is greater than 0
a kurtosis value that is less than 0
split the values into 4 equal parts
divides the smallest 25% of the values from the other 75% that are larger
the median; 50% of the values are smaller than or equal to the median, and 50% are larger than or equal to the median.
divided the smallest 75% of the values from the largest 25%
Split a variable into 100 equal parts
measures the difference in the center of a distribution between the third and first quartiles
Descriptive statistics such as the median, Q1,Q3, and the interquartile range, which are not influenced by extreme values
Sum of the values in the population divided by the population size, N.
States that for population data that form a normal distribution, the following are true:
1. Approximately 68% of the values are within +- 1 standard deviation from the mean
2. Approx. 95% of the values are within +-2 standard deviations from the mean
3. Approx. 99.7% of the values are within +-3 standard deviations from the mean
The Chebyshev Rule
States for any data set, regardless of shape, the percentage of values that are found within distances of k standard deviations from the mean must be at least (1-1/k^2)x100%. You can use this rule for any value of k greater than 1. Use this rule for heavily skewed data sets that do not appear to be normally distributed. The rule indicates at least what percentage of the values fall within a given distance from the mean.
measures the strength of the linear relationship between 2 numerical variables
Coefficient of Correlation
Measures the relative strength of a linear relationship between 2 numerical variables. Range from -1 for a perfect negative correlation to +1 for a perfect positive relationship
Numerical value representing the chance, likelihood, or possibility that a particular event will occur.
Probability of an occurrence is based on prior knowledge of the process involved.
Probabilities are based on observed data, not on prior knowledge of a process
differs from person to person; usually based on a person's past experience, personal experience, and analysis of a particular situation
Each possible outcome of a variable
Described by a single characteristic
An event that has 2 or more characteristics
collection of all possible events
probability of the occurrence of a simple event
probability of occurrence involving 2 or more events
Consists of a set of joint probabilities (Add them all together)
General addition rule
P(A or B)= P(A)+P(B)-P(A and B)
refers to the probability of event A, given information about the occurrence of another event, B.
P(AlB) = P(A and B)/ P(B)
alternative to a contingency table
When the outcome of one event does not affect the probability of occurrence of another event. 2 events are independent if P(A l B) = P(A)
General multiplication rule
P(A and B) = P(AlB)P(B)
Multiplication rule for independent events
P(A and B)= P(A)P(B)
used to revise previously calculated probabilities based on new information
Probability distribution for a discrete variable
mutually exclusive list of all the possible numerical outcomes along with the probability of occurrence of each outcome
the mean of the probability distribution
Covariance of a probability distribution
measures the strength of the relationship between 2 variables
mathematical expression that represents a variable of interest.
Probability distribution function
math model for discrete random variables
Used when the discrete variable is the number of events of interest in a sample of n observations; has 4 important properties:
1. The sample consists of a fixed number of observations, n.
2. Each observation is classified into one of 2 mutually exclusive and collectively exhaustive categories.
3. The probability of an observation being classified as the event of interest, pi, is constant from observation to observation. Thus, the probability of an observation being classified as not being the event of interest, 1-pi, is constant over all observations.
4. The value of any observation is independent of the value of any other observation
Used to calculate probabilities in situations such as these if the following properties hold:
1. You are interested in counting the number of times a particular event occurs in a given area of opportunity. The area of opportunity is defined by time, length, surface area, and so forth.
2. The probability that an event occurs in a given area of opportunity is the same for all the areas of opportunity
3. The number of events that occur in one area of opportunity is independent of the number of events that occur in any other area of opportunity
4. The probability that 2 or more events will occur in an area of opportunity approaches 0 as the area of opportunity becomes smaller.
The sample data are selected without replacement from a finite population, thus the result of one observation is dependent on the results of the previous observations.
the most common continuous distribution used in statistics. It is vitally important in statistics for 3 main reasons:
1. Numerous continuous variables common in business have distributions that closely resemble the normal distribution
2. The normal distribution can be used to approx. various discrete probability distributions
3. It provides the basis for classical statistical inference because of its relationship to the central limit theorem.
It is represented by the classic bell shape
Important theoretical properties of the normal distribution
1. It is symmetrical, and its mean and median are therefore equal
2. It is bell-shaped in appearance
3.Its interquartile range is equal to 1.33 standard deviations. Thus, the middle 50% of the values are contained within an interval of two-thirds of a standard deviation below the mean and two-thirds of a standard deviation above the mean
4. It has an infinite range
Normal probability plot
a visual display that helps you evaluate whether the data are normally distributed
Sampling distribution of the mean
The distribution of all possible sample means if you select all possible samples of a given size
Central Limit theorem
As the sample size gets large enough, the sampling distribution of the mean is approx. normally distributed. This is true regardless of the shape of the distribution of the individual values in the population
Conclusions of the central limit theorem
1. For most distributions, regardless of the shape of the population, the sampling distribution of the mean is approx. normally distributed if samples of at least size 30 are selected
2. If the distribution of the population is fairly symmetrical, the sampling distribution of the mean is approx. normal for samples as small as size 5.
3. If the population is normally distributed, the sampling distribution of the mean is normally distributed, regardless of the sample size.
the variation that occurs due to selecting a single sample from the population. The size of the sampling error is primarily based on the amount of variation in the population and on the sample size. Large samples have less sampling error than small samples, but large samples cost more to select.
very similar in appearance to the standardized normal distribution. The t distribution has more area in the tails and less in the center than does the standardized normal distribution.
What 3 quantities do you need to compute the sample size
1. The desired confidence level, which determines the value of the critical value from the standardized normal distribution
2. The acceptable sampling error
3. The standard deviation
The hypothesis that the population parameter is equal to the company specification.
the conclusion reached by rejecting the null hypothesis
Summary of null and alternative hypothesis
1. The null hypothesis represents the current belief in the situation
2. The alternative hypothesis is the opposite of the null hypothesis and represents a research claim or specific inference you would like to prove
3. If you reject the null hypothesis, you have statistical proof that the alternative hypothesis is correct
4. If you do not reject the null hypothesis, you have failed to prove the alternative hypothesis. The failure to prove the alternative hypothesis
5. The null hypothesis always refers to a specified value of the population parameter, not a sample statistic
6. The statement of the null hypothesis always contains an equal sign regarding the specified field value of the population parameter
7. The statement of the alternative hypothesis never contains an equal sign regarding the specified value of the population parameter
The first thing you determine to make a decision concerning the null hypothesis; it divides the nonrejection region from the rejection region. The size of the rejection region is directly related to the risks involved in using only sample evidence to make decisions about a population parameter
Type 1 Error
occurs if you reject the null hypothesis when it is true and should not be rejected; known as a "false alarm"
Type 2 error
occurs if you do not reject the null hypothesis when it is false and should be rejected; known as a "missed opportunity" to take some corrective action
Level of significance
probability of committing a type 1 error
probability of committing a type 2 error
the complement of the probability of a type 1 error; the probability that you will not reject the null hypothesis when it is true and should not be rejected.
Power of a statistical test
The complement of the probability of a type 2 error; the probability that you will reject the null hypothesis when it is false and should be rejected
the probability of getting a test statistic equal to or more extreme than the sample result, given that the null hypothesis is true; known as the observed level of significance. Using the p-value to determine rejection and nonrejection is another approach to hypothesis testing
The decision rules for rejecting the null hypothesis in the p-value approach are
1. If the p-value is greater than or equal to a, do not reject the null hypothesis
2. If the p-value is less than a, reject the null hypothesis
IF THE P-VALUE IS LOW, THE NULL HYPOTHESIS MUST GO
t test is an example; it does not lose power if the shape of the population departs somewhat from a normal distribution, particularly when the sample size is large enough to enable the test statistic to follow the t distribution.
Summary of the null and alternative hypotheses for one-tail tests
1. The null hypothesis represents the status quo or the current belief in a situation
2. The alternative hypothesis is the opposite of the null hypothesis and represents a research claim or specific inference you would like to prove
3. If you reject the null hypothesis, you have statistical proof that the alternative hypothesis is correct
4. If you do not reject the null hypothesis, you have failed to prove the alternative hypothesis. The failure to prove the alternative hypothesis, however, does not mean that you have proven the null hypothesis.
5. The null hypothesis always refers to a specified value of the population parameter, not to a sample statistic
6. The statement of the null hypothesis always contains an equal sign regarding the specified value of the parameter
7. The statement of the alternative hypothesis never contains an equal sign regarding the specified value of the parameter.
pooled-variance t test
Can be used if you assume that the random samples are independently selected from 2 populations and that the populations are normally distributed and have equal variances to determine whether there is a significant difference between the means
When do you reject the null hypothesis in a two tail test?
if the computed test statistic is greater than the upper-tail critical value from the t distribution or if the computed test statistic is less than the lower tail critical value from the t distribution
Separate -variance t test
Used if you can assume that the 2 independent populations are normally distributed but cannot assume that they have equal variances, you cannot pool the two sample variances into the common estimate and therefore cannot use the pooled-variance t test.
paired t test
Can use if you assume that the difference scores are randomly and independently selected from a population that is normally distributed in order to determine whether there is a significant population mean difference
THIS SET IS OFTEN IN FOLDERS WITH...
STATS TEST #2
Chapter 7 Business stats
Statistics Chapter 4
YOU MIGHT ALSO LIKE...
Stats Exam 2
Elementary Statistics Final Exam
STAT100 Second Exam Notes
OTHER SETS BY THIS CREATOR
Capstone Exam 1
OTHER QUIZLET SETS
ISDS 361A Exam 1 - Practice exam 1
Stats Test 3
Stat Final Exam