65 terms

A hotel chain sent 2,000 past guests an email asking them to rate the service in the hotel during their most recent visit. Of the 500 who replied, 450 rated the service as excellent.

(a) Identify whether the data are cross sectional or a time series.

(b) Give a name to the variable and indicate if the variable is categorical, ordinal, or numerical

(c) List any concerns that you might have for the accuracy of the data

(a) Identify whether the data are cross sectional or a time series.

(b) Give a name to the variable and indicate if the variable is categorical, ordinal, or numerical

(c) List any concerns that you might have for the accuracy of the data

(a) The data are cross sectional.

(b) Name the variable of the rating of the service. Indicate if the variable is categorical, ordinal, or numerical. If the variable is numerical, SERVICE RATING = Ordinal

(c) List any concerns that you might have for the accuracy of the data. Select all that apply. (A) The data for the 500 guests who replied may not be representative of the other 1,500 guests.

(b) Name the variable of the rating of the service. Indicate if the variable is categorical, ordinal, or numerical. If the variable is numerical, SERVICE RATING = Ordinal

(c) List any concerns that you might have for the accuracy of the data. Select all that apply. (A) The data for the 500 guests who replied may not be representative of the other 1,500 guests.

Brand of car owned by drivers

Variable Name = Brand of Car

Type = Categorical

Cases = Drivers

Type = Categorical

Cases = Drivers

Zip codes are an example of numerical data.

False. Zip codes are numbers, but sensible calculations cannot be performed with them and they do not have measurement units.

Cases is another name for the columns in a data table.

False. Cases is another name for the rows in a data table.

The frequency of a time series refers to the time spacing between rows of data.

True

A column in a data table holds the values associated with an observation.

The statement is false. A row in a data table typically holds the values associated with an observation.

A Likert scale represents numerical data.

False. A Likert scale represents ordinal data.

Aggregation adds further rows to a data table.

False. Aggregation collapses a table into one with fewer rows.

A start-up company built a database of customers and sales information. For each customer, it recorded the customer's name, zip code, region of the country (East, South, Midwest, West), date of last purchase, amount of purchase (in dollars), and item purchased.

(A) Identify whether the data are cross sectional or a time series.

(B) Give a name to each variable and indicate if the variable is categorical, ordinal, or numerical (if a variable is numerical

(C) List any concerns that you might have for the accuracy of the data.

(A) Identify whether the data are cross sectional or a time series.

(B) Give a name to each variable and indicate if the variable is categorical, ordinal, or numerical (if a variable is numerical

(C) List any concerns that you might have for the accuracy of the data.

(A) The data are CROSS SECTIONAL.

(B) Name, ZIP, Reigon= Categorical

Last Purchase Date = Ordinal

Amount of purchase = Numerical/ Dollars

Items Purchased = Categorical

(C) The zip code (and presumably the region as well) depends on the honesty of the customer.

(B) Name, ZIP, Reigon= Categorical

Last Purchase Date = Ordinal

Amount of purchase = Numerical/ Dollars

Items Purchased = Categorical

(C) The zip code (and presumably the region as well) depends on the honesty of the customer.

If all of the bars in a bar chart have the same length, then the categorical variable shown in the chart has no variation.

False because the values must be in one category to have no variation.

Use a bar chart to show frequencies and a pie chart to show shares of a categorical variable that is not ordinal.

True

A Pareto chart puts the modal category first.

True

The plot to the right shows the holdings comma in billions of dollars commaholdings, in billions of dollars, of U.S. Treasury bondsof U.S. Treasury bonds in five AsianAsian countriescountries. Is this a proper bar chart or a chart of a table of five numbers that uses bars?

The chart is a proper bar chart because it displays the distribution of a categorical variable using a sequence of bars

The data shown in the table to the right summarize how 360 executives responded to a question that asked them to list the factors that impede the flow of knowledge within their company. Do these percentages belong together in a pie chart?

No because the categories do not identify groups of one categorical variable.

Describe the bar chart of a categorical variable for which 180 of the 200 rows of data are in one category and the remaining 20 are distributed evenly among two categories.

The bar chart would have one bar of height 180 and two bars of height

10 each.

10 each.

A categorical variable has two values, male and female. Would you prefer to see a bar chart, a pie chart, or a frequency table? Why?

A bar chart because, for only two counts, its summary will give the most information.

Proportion of automobiles sold during the last quarter by eight major manufacturers

Bar chart or pie chart

Determine which graphical displays would be appropriate for the variable described below. If necessary, indicate whether it would be appropriate to show the frequencies or relative frequencies.

Destinations for travelers leaving the United States and heading abroad

Destinations for travelers leaving the United States and heading abroad

Two Answers

Pie Chart (share)

Bar Chart

Pie Chart (share)

Bar Chart

Share of software purchases devoted to games, office work, and design

Pie Chart

Auto manufacturers are sensitive to the color preferences of consumers. Would they like to know the modal preference or the median preference?

The modal preference because the median is not defined.

Describe your impression of the underlying data table. Do you think that it has a row for every case of soft drinks that was sold?

The underlying data table probably accumulates case sales by brand to some degree. It is unlikely that every case is represented by a row.

How does the presence of a large group of other responses affect the use of these charts?

The large other category dominates the pie charts.

Do you have to show the other category when using the bar chart?

No, the other category is not needed in the bar chart.

If each respondent in the survey listed several areas for research, would it then be appropriate to form a pie chart of the values in a column of the table?

No, because the categories would no longer partition the cases into distinct, non-overlapping subsets.

Do these plots suggest that women-owned business concentrate in some industries?

The plots are similar, but women-owned business seems to be slightly concentrated in G.

Construct a single plot that directly shows which industries have the highest concentration of women-owned businesses. Would a pie chart of the second row in the table be appropriate?

No, because a pie chart would not show the percentage of women-owned businesses within each industry.

One data table tracks firms owned by women in 1992. The other tracks all firms and was collected three years later in 1995. Is this time gap a problem?

Yes, but a slight one because some industries may have changed in number during the time gap, but the change would be relatively small due to the broad nature of the categories.

Compare the impression conveyed by these plots of any anticipated trend in outsourcing of white-collar jobs.

The bar chart facilitates comparison and the pie chart makes the relative shares more apparent.

Does the mode differ from the median for either distribution?

In 2003, the median and the mode are the samethe same. In 2008, they are also the samealso the same.

What question does this chart answer? What does it conceal?

The pie chart shows that there are large differences in the percentages for the different occupations. It conceals the small percentage of actuaries and statisticians and the large percentage of computer-related occupations.

What is the modal category?

The modal category is

computer support specialists.

computer support specialists.

You should put the labels of the categories in order in a bar chart when showing the frequencies of an ordinal variable.

True

The table to the right summarizes results of a survey of the purchasing habits of 1,800 13- to 25-year-olds in a country in 2006, a much sought-after group of customers. Each row of the table indicates the percentage of participants in the survey who say that particular statement. Complete parts a and b.

(A) Would it be appropriate to summarize these percentages in a single pie chart?

(B) ) What type of chart would you recommend for these four percentages?

(A) Would it be appropriate to summarize these percentages in a single pie chart?

(B) ) What type of chart would you recommend for these four percentages?

(A) No because the categories do not make up shares of a whole.

(B) A chart with divided bars

(B) A chart with divided bars

Suppose a column measures the amount spent by the last 150 customers at a convenience store. The purchase amounts were coded as Small if less than $88, Typical if between $8 and $22, and Large if more than $22. Would a bar chart be useful? How should the categories be displayed?

The bar chart would be useful because it would show the distribution of purchase amounts in a simple way. The categories should be displayed in order of frequency.

The frequency of a category is the dollar value of the observations in that group.

The statement is false because frequency is the count of the items.

The figure shows the histogram of the annual tuition at 67 top undergraduate business schools.

(A) Estimate from the figure the center and spread of the data. Are the usual notions of center and spread useful for these data?

(B) Describe the shape of the histogram.

(C) If you were only shown the boxplot, would you be able to identify the shape of the distribution of these data?

(D) Can you think of an explanation for the shape of the histogram?

(A) Estimate from the figure the center and spread of the data. Are the usual notions of center and spread useful for these data?

(B) Describe the shape of the histogram.

(C) If you were only shown the boxplot, would you be able to identify the shape of the distribution of these data?

(D) Can you think of an explanation for the shape of the histogram?

(A) The median is about $14,000 and the interquartile range is about $18,000.

(B) The shape of the histogram is bimodal.

(C) No

(D) The first mode represents public schools and the second mode represents private schools.

(B) The shape of the histogram is bimodal.

(C) No

(D) The first mode represents public schools and the second mode represents private schools.

The accompanying data table gives the car type and rated horsepower for 30 different models of car available for purchase during a recent year.

(B) Which statement below best describes the histogram?

(C) Which statement below best interprets the histogram?

(D) What does the histogram tell you that the boxplot does not? Select all the choices below that apply.

(E) What does the boxplot tell you that the histogram does not? Select all the choices below that apply.

(B) Which statement below best describes the histogram?

(C) Which statement below best interprets the histogram?

(D) What does the histogram tell you that the boxplot does not? Select all the choices below that apply.

(E) What does the boxplot tell you that the histogram does not? Select all the choices below that apply.

(B) The histogram is right skewed.

(C) One model of car has much higher horsepower.

(D) The shape of the distribution

(E) The presence of any outliers, The location of the middle half of the values, The location of the median

(C) One model of car has much higher horsepower.

(D) The shape of the distribution

(E) The presence of any outliers, The location of the middle half of the values, The location of the median

Outliers have a more dramatic effect on smaller data sets. For this example, the accompanying data consist of the sizes (in seconds and MB) of the 27 songs on a band's greatest hits album.

(B) Identify any outliers.

(C) What is the effect of excluding this song on the mean and median of the sizes of the songs?

(D) Which summary, the mean or median, is the better summary of the center of the distribution of sizes?

(E) Which summary, the mean or the median, is the more useful summary if you want to know if you can fit this album on your mp3 player?

(B) Identify any outliers.

(C) What is the effect of excluding this song on the mean and median of the sizes of the songs?

(D) Which summary, the mean or median, is the better summary of the center of the distribution of sizes?

(E) Which summary, the mean or the median, is the more useful summary if you want to know if you can fit this album on your mp3 player?

(B) The size of the song that is the outlier is

6.88 minutes and 6.44 MB.

(C) The mean excluding any outliers is 2.58 MB, whereas the mean including all of the data is

2.72 MB. The median excluding any outliers is 2.41MB, whereas the median including all of the data is

2.51MB.

(D) Median

(E) Mean

6.88 minutes and 6.44 MB.

(C) The mean excluding any outliers is 2.58 MB, whereas the mean including all of the data is

2.72 MB. The median excluding any outliers is 2.41MB, whereas the median including all of the data is

2.51MB.

(D) Median

(E) Mean

A histogram with a long left tail.

Skewed

Position of a peak in the histogram.

Mode

The boxplot shows the mean plus or minus one standard deviation of the data.

False. The boxplot shows the median, with the lower edge at the 25th percentile point and the upper edge at the 75th percentile point.

If the standard deviation of a variable is 0, then the mean is equal to the median.

True

Which has larger standard deviation, the distribution of weekly allowances to 12-year-olds or the distribution of monthly household mortgage payments? Would the same distribution also have the larger coefficient of variation?

(A) Which distribution has larger standard deviation?

(B) Would the same distribution also have the larger coefficient of variation?

(A) Which distribution has larger standard deviation?

(B) Would the same distribution also have the larger coefficient of variation?

(A) The distribution of monthly household mortgage payments

(B) It is impossible to tell with the given information.

(B) It is impossible to tell with the given information.

The number of standard deviations from the mean.

Z Score

If the median size used by 550 songs is 3.2 MB, will these all fit on a device that has 2 GB of storage? Can you tell?

From the given information, it is not possible to determine if these songs will fit on the device.

An environmental act mandates that gasoline sold in a certain country must average at least 2.71% ethanol by the following year. Does this mean that every gallon of gas sold has to include ethanol?

Every gallon of gasoline sold does not have to include ethanol. The calculation of an average does not require that every gallon of gasoline contain ethanol.

An adjustable rate mortgage allows the rate of interest to fluctuate over the term of the loan, depending on economic conditions. A fixed rate mortgage holds the rate of interest constant. Which sequence of monthly payments has smaller variance, those on an adjustable rate mortgage or those on a fixed rate mortgage?

Since the variance for the fixed rate mortgage is equal to zero and the variance for the adjustable rate mortgage is greater than or equal to zero, the sequence of monthly payments for the fixed rate mortgage has smaller variance.

Wider bins produce a histogram with fewer modes than would be found in a histogram with very narrow bins.

True

The mean is

the balance point of the histogram.

The standard deviation is

a measure of the spread of the histogram.

When the mean is greater than the median, the distribution is skewed left.

False. When the mean is greater than the median, the distribution is skewed right.

Table of cross-classified counts

contingency table

Measure of association between two categorical variables that grows with increased sample size

chi-squared

Conditional distribution matches marginal distribution

not associated

Produced by a variable lurking behind a table

Simpson's paradox

We can fill in the cells of the contingency table from the marginal counts alone if the two categorical variables are not associated.

The statement is true.

The percentages of cases in the first column within each row of a contingency table are the same if the variables are not associated.

True

The value of chi-squared depends on the number of observations in a contingency table

True

The value of chi-squared depends on which variable defines the rows and which defines the columns of the contingency table

This statement is false. The value of chi-squared is the same in either case because the row and column definitions are arbitrary.

If the categorical variable that identifies the supervising manager is associated with the categorical variable that indicates a problem with processing orders, then the manager is causing the problems.

False. Due to the possible presence of a lurking variable, association cannot be interpreted as causation.

If the percentage of female job candidates who are hired is larger than the percentage of male candidates who are hired, then there is association between the categorical variables Sex (male, female) and Hire (yes, no).

True

The accompanying table summarizes the status of 1000 loans made by a bank. Each loan either ended in default or was repaid. Loans were divided into large (more than $50,000) or small size.

(A) Determine what it would mean to find association between the variables for loan size and payment status.

(B) Based on the given table, is there association between the variables?

(A) Determine what it would mean to find association between the variables for loan size and payment status.

(B) Based on the given table, is there association between the variables?

(A) Large and small loans have different chances of being repaid.

(B) Yes, because the payment statuses among large and small loans are not approximately the same.

(B) Yes, because the payment statuses among large and small loans are not approximately the same.

Numerous epidemiological studies associate a history of smoking with the presence of lung cancer. If a study finds association (cancer rates are higher among smokers), does this mean that smoking causes cancer?

This association does not mean that smoking causes cancer because association is not the same as causation.

Shown in the bar chart of a categorical variable

marginal distribution

A study of purchases at a 24-hour supermarket recorded two categorical variables: the time of the purchase (8 A.M to 8 P.M vs. late night) and whether the purchase was made by someone with children present. Would you expect these variables to be associated?

Yes. Fewer shoppers with children present would be expected during late night.