Upgrade to remove ads
Terms in this set (56)
A histogram's x-axis represents bins corresponding to ranges of data; its y-axis indicates the frequency of observations falling into each bin.
A value that falls far from the rest of the data
Measures the degree of a graph's asymmetry.
Equal to sum of all data points in the set divided by the number of data points
middle value of the data set. 50th percentile of the data set.
Value that occurs most frequently in the data set. A data set may have multiple of these
The mean of a subset of the data. We apply a condition and calculate the mean for values that meet that condition.
another value of interest
Equal to the square root of the variance. Te same units as the data itself
Measures the size of the standard deviation relative to the size of the mean
Reveals the relationships between two variables, or data sets.
We can quantify the strength of a linear relationship between two variables by calculating the ....
When one of the variables is time, the relationship is known as a....
Cross sectional data
Provides a snapshot of data across multiple groups at a given point in time
Calculate the Mean on Excel
=AVERAGE(B2:B193) and =SUM(B2:B193)/192 (Numbers are random)
Standard Deviation Calculation on Excel
=STDEV.S(number 1, [number 2], ...)
By removing outliers from the data set, the standard deviation....
The standard deviation decrease. The standard deviation gives more weight to observations that are further from the mean. Therefore, removing the outliers would decrease the standard deviation.
If outliers exist in a data set, one should...
Research the data points and then make a decision based on the findings. National Musuem of America Question Example Quiz 1: Q6: The consultant should delete or change data points only if careful examination of the data and the data sources indicates that the data points are incorrect or irrelevant to the research at hand. The consultant must use his or her experience and knowledge of the research question to make decisions on a case-by-case basis. Doing business analytics effectively requires judgment. In this case, the National Museum of American History underwent renovations which reduced significantly the number of visits to the museum in 2007 and 2008. The data points for 2007 and 2008 are correct and should not be changed. However, the fact that the museum was closed during most of that two year period should be considered when drawing conclusions from this data set.
Conditional Mean Excel
=AVERAGEIF(range, criteria, [average_range])
Which of the following formulas would calculate the statistic that is MOST APPROPRIATE for comparing the variability of two data sets with different distributions?
Standard Deviation/Mean. Explanation: This is the formula for the coefficient of variation, the best statistic to compute to compare the variability of two data sets with different distributions. Dividing by the mean provides a measure of the distribution's variation relative to the mean.
Coefficient Formula Excel
Standard Deviation/ Mean
Percentile Formula Excel
=CORREL(array 1, array 2)
When the distribution of data is skewed to the right, the mean is most likely greater than the median. The extreme values in the right tail pull the mean towards them.
When the distribution of data is skewed to the left, the mean is most likely less than the median. The extreme values in the left tail pull the mean towards them.
When the distribution of data is symmetric, the mean and median are equal.
When the distribution of data is bimodal, the mean may be less than, equal to, or greater than the median.
What happens to the sample mean and standard deviation as you take new samples of equal size?
The sample mean and standard deviation vary but remain fairly close to the population mean and standard deviation. EXPLANATION :Since each sample is randomly selected, the mean and standard deviation vary from one sample to the next. However, since the sample size is fairly large, each sample's mean and standard deviation are fairly close to the population mean and standard deviation. We'll learn more about how to select a good sample later.
How do you make sound inferences ?
Make sure the sample is representative of the population by choosing members randomly to ensure that each member of the population is equally likely to be included in the sample.
How to avoid bias results?
- phrasing questions neutrally;
- ensuring that the sampling method is appropriate for the demographic of the target population; and
pursuing high response rates.
-It is often better to have a smaller sample with a high response rate than a larger sample with a low response rate.
A unique symmetrical shape whose center and width are determined by its mean and standard deviation respectively. Due to the normal ______symmetric shape, 50% of the probability lies below the mean, and 50% lies above the mean.
For every normal distribution, the probability of being within a specified number of standard deviations from the mean is the same.
68% of the probability is contained in the range reaching one standard deviation away from the mean on either side, that is,
95% of the probability is contained in the range reaching two standard deviations (1.96 to be exact) away from the mean on either side, that is,
99.7% of the probability is contained in the range reaching three standard deviations away from the mean on either side, that is,
Point x is the distance x lies from the mean, measured in standard deviations
Central Limit Theorem
if we take enough sufficiently large samples from any population, the means of those samples will be normally distributed, regardless of the shape of the underlying population.
Distribution of Sample Means
closely approximates a normal curve as we increase the number of samples and/or the sample size.
The mean of the Distribution of Sample Means equals the mean of the population distribution.
The standard deviation of the Distribution of Sample Means equals the standard deviation of the population distribution divided by the square root of the sample size. Thus, increasing the sample size decreases the width of the Distribution of Sample Means.
The sample mean is only a point estimate. We can construct a range around the sample mean, called a _______ which contains the true population mean with a certain level (e.g., 95%) of confidence
For a 95% confidence interval, on average, 95% of samples drawn from the population will have the population mean within the confidence interval. Note that a confidence interval's level of confidence does not tell us the chance, probability, or likelihood that an individual confidence interval contains the true population mean.
The width of the confidence interval depends on the level of confidence, our best estimate of the population standard deviation, and the sample size. We control only the level of confidence and the sample size.
Examples of Biased Questions
Isn't Daft Punk a better band than Oasis?
Research has linked carbon emissions to global warming.
Do you think the US government should enact legislation to limit carbon emissions?
Do you enjoy the work of such literary giants as William Shakespeare?
Do you think people benefit from taking overpriced diet supplements?
Examples of Unbiased Questions
Do you believe that current popular music is better, worse, or about the same quality as popular music from 20 years ago?
Do you think women should be drafted into the military?
How often do you eat spinach, kale, or other leafy green vegetables?
According to the Central Limit Theorem, the means of random samples from which of the following distributions will be normally distributed, assuming the samples are sufficiently large?
According to the Central Limit Theorem, if we take large enough samples, the distribution of sample means will be normally distributed regardless of the shape of the underlying population.
Several probability expressions for a normal distribution
Normal Distribution Excel Formula
=NORM.DIST(x, mean, standard_dev, TRUE)
Confidence Interval Formula
Margin of Error=CONFIDENCE.NORM(alpha, standard_dev, size).
The lower bound of the 95% confidence interval is the mean minus the margin of error
The upper bound of the 95% confidence interval is the mean plus the margin of error
How to find the standard deviation with incomplete info? then to find cumulative probability.
Step 1: subtract the lower bound from the mean and divide by 1.96
Step 2: Use the Excel function NORM.DIST(x, mean, standard_dev, TRUE
Example: "X" can be a student's score
ALL of the ways you can reduce the width of the confidence interval
Increase the sample size
Decrease the confidence level
Which of the following is the MOST LIKELY result of using a survey with biased questions?
The data in your sample will differ in a systematic way from data based on unbiased random selections from the population.
The null hypothesis is a statement about a topic of interest. It is typically based on historical information or conventional wisdom. We always start a hypothesis test by assuming that the null hypothesis is true and then test to see if we can nullify it—that's why it's called the "null" hypothesis. The null hypothesis is the opposite of the hypothesis we are trying to prove (the alternative hypothesis).
The alternative hypothesis (the opposite of the null hypothesis) is the theory or claim we are trying to substantiate. If our data allow us to nullify the null hypothesis, we substantiate the alternative hypothesis.
Let's return to the movie theater example and focus on the sample taken after the manager changes the theater's artistic focus. Suppose the average satisfaction rating of the sample is 9.9 out of 10. Which of the following do you think would be the correct conclusion? Remember that H0:μ=6.7 and Ha:μ≠6.7.
Reject the null hypothesis
The null hypothesis is that the average satisfaction rating has not changed, that is, that the population mean μμ is still equal to 6.7. Drawing a sample with an average satisfaction rating of 9.9 from a population that has an average rating of 6.7 is extremely unlikely, so we would almost certainly reject the null hypothesis and conclude that the average satisfaction rating is no longer 6.7.
Coefficient of variation
Coefficient of Variation Excel Formula
Type 1 Error
The probability of a type I error is equal to the significance level, which is 1-confidence level. A 90% confidence level indicates that the significance level is 10%. Therefore there is a 10% chance of making a type I error.
One side P Value
The one-sided p-value is half of the two-sided p-value. Since the two-sided p-value is 0.0040, the one-sided p-value is 0.0040/2=0.0020.
Low R-squared, Low p-value
A low R-squared and low p-value indicates that the independent variable explains little variation in the dependent variable and the linear relationship between the two variables is significant.
there is more variability at the lower values at the higher values.
THIS SET IS OFTEN IN FOLDERS WITH...
Formulas and Excel Funtions - Business Analytics
Economics for Managers - Final
YOU MIGHT ALSO LIKE...
ISDS 361A Final Exam Concept Questions
ISDS 361A Final Exam Concept Questions
Statistical Quantitative Analysis
OTHER QUIZLET SETS
Bill of Rights and Amendments
ISYS Exam 2 Review