A hotel chain sent​ 2,000 past guests an email asking them to rate the service in the hotel during their most recent visit. Of the 500 who​ replied, 450 rated the service as excellent.
(a) Identify whether the data are cross sectional or a time series.
(b) Give a name to the variable and indicate if the variable is​ categorical, ordinal, or numerical
(c) List any concerns that you might have for the accuracy of the data
​(a) The data are cross sectional.
(b) Name the variable of the rating of the service. Indicate if the variable is​ categorical, ordinal, or numerical. If the variable is​ numerical, SERVICE RATING = Ordinal
​(c) List any concerns that you might have for the accuracy of the data. Select all that apply. (A) The data for the 500 guests who replied may not be representative of the other​ 1,500 guests.
Brand of car owned by drivers
Variable Name = Brand of Car
Type = Categorical
Cases = Drivers
Zip codes are an example of numerical data.
False. Zip codes are​ numbers, but sensible calculations cannot be performed with them and they do not have measurement units.
Cases is another name for the columns in a data table.
False. Cases is another name for the rows in a data table.
The frequency of a time series refers to the time spacing between rows of data.
True
A column in a data table holds the values associated with an observation.
The statement is false. A row in a data table typically holds the values associated with an observation.
A Likert scale represents numerical data.
False. A Likert scale represents ordinal data.
Aggregation adds further rows to a data table.
False. Aggregation collapses a table into one with fewer rows.
A​ start-up company built a database of customers and sales information. For each​ customer, it recorded the​ customer's name, zip​ code, region of the country​ (East, South,​ Midwest, West), date of last​ purchase, amount of purchase​ (in dollars), and item purchased.
(A) Identify whether the data are cross sectional or a time series.
(B) Give a name to each variable and indicate if the variable is​ categorical, ordinal, or numerical​ (if a variable is​ numerical
(C) List any concerns that you might have for the accuracy of the data.
(A) The data are CROSS SECTIONAL.
(B) Name, ZIP, Reigon= Categorical
Last Purchase Date = Ordinal
Amount of purchase = Numerical/ Dollars
Items Purchased = Categorical
(C) The zip code​ (and presumably the region as​ well) depends on the honesty of the customer.
If all of the bars in a bar chart have the same​ length, then the categorical variable shown in the chart has no variation.
False because the values must be in one category to have no variation.
Use a bar chart to show frequencies and a pie chart to show shares of a categorical variable that is not ordinal.
True
A Pareto chart puts the modal category first.
True
The plot to the right shows the holdings comma in billions of dollars commaholdings, in billions of dollars, of U.S. Treasury bondsof U.S. Treasury bonds in five AsianAsian countriescountries. Is this a proper bar chart or a chart of a table of five numbers that uses​ bars?
The chart is a proper bar chart because it displays the distribution of a categorical variable using a sequence of bars
The data shown in the table to the right summarize how 360 executives responded to a question that asked them to list the factors that impede the flow of knowledge within their company. Do these percentages belong together in a pie​ chart?
No because the categories do not identify groups of one categorical variable.
Describe the bar chart of a categorical variable for which 180 of the 200 rows of data are in one category and the remaining 20 are distributed evenly among two categories.
The bar chart would have one bar of height 180 and two bars of height
10 each.
A categorical variable has two​ values, male and female. Would you prefer to see a bar​ chart, a pie​ chart, or a frequency​ table? Why?
A bar chart​ because, for only two​ counts, its summary will give the most information.
Proportion of automobiles sold during the last quarter by eight major manufacturers
Bar chart or pie chart
Determine which graphical displays would be appropriate for the variable described below. If​ necessary, indicate whether it would be appropriate to show the frequencies or relative frequencies.

Destinations for travelers leaving the United States and heading abroad
Pie Chart (share)
Bar Chart
Share of software purchases devoted to​ games, office​ work, and design
Pie Chart
Auto manufacturers are sensitive to the color preferences of consumers. Would they like to know the modal preference or the median​ preference?
The modal preference because the median is not defined.
Describe your impression of the underlying data table. Do you think that it has a row for every case of soft drinks that was​ sold?
The underlying data table probably accumulates case sales by brand to some degree. It is unlikely that every case is represented by a row.
How does the presence of a large group of other responses affect the use of these​ charts?
The large other category dominates the pie charts.
Do you have to show the other category when using the bar​ chart?
No, the other category is not needed in the bar chart.
If each respondent in the survey listed several areas for​ research, would it then be appropriate to form a pie chart of the values in a column of the​ table?
​No, because the categories would no longer partition the cases into​ distinct, non-overlapping subsets.
Do these plots suggest that​ women-owned business concentrate in some​ industries?
The plots are​ similar, but​ women-owned business seems to be slightly concentrated in G.
Construct a single plot that directly shows which industries have the highest concentration of​ women-owned businesses. Would a pie chart of the second row in the table be​ appropriate?
No, because a pie chart would not show the percentage of​ women-owned businesses within each industry.
One data table tracks firms owned by women in 1992. The other tracks all firms and was collected three years later in 1995. Is this time gap a​ problem?
Yes, but a slight one because some industries may have changed in number during the time​ gap, but the change would be relatively small due to the broad nature of the categories.
Compare the impression conveyed by these plots of any anticipated trend in outsourcing of​ white-collar jobs.
The bar chart facilitates comparison and the pie chart makes the relative shares more apparent.
Does the mode differ from the median for either​ distribution?
In​ 2003, the median and the mode are the samethe same. In​ 2008, they are also the samealso the same.
What question does this chart​ answer? What does it​ conceal?
The pie chart shows that there are large differences in the percentages for the different occupations. It conceals the small percentage of actuaries and statisticians and the large percentage of​ computer-related occupations.
What is the modal​ category?
The modal category is
computer support specialists.
You should put the labels of the categories in order in a bar chart when showing the frequencies of an ordinal variable.
True
The table to the right summarizes results of a survey of the purchasing habits of​ 1,800 13- to​ 25-year-olds in a country in​ 2006, a much​ sought-after group of customers. Each row of the table indicates the percentage of participants in the survey who say that particular statement. Complete parts a and b.
(A) Would it be appropriate to summarize these percentages in a single pie​ chart?
(B) ) What type of chart would you recommend for these four​ percentages?
(A) No because the categories do not make up shares of a whole.
(B) A chart with divided bars
Suppose a column measures the amount spent by the last 150 customers at a convenience store. The purchase amounts were coded as Small if less than ​\$88​, Typical if between ​\$8 and ​\$22​, and Large if more than ​\$22. Would a bar chart be​ useful? How should the categories be​ displayed?
The bar chart would be useful because it would show the distribution of purchase amounts in a simple way. The categories should be displayed in order of frequency.
The frequency of a category is the dollar value of the observations in that group.
The statement is false because frequency is the count of the items.
The figure shows the histogram of the annual tuition at 67 top undergraduate business schools.
(A) Estimate from the figure the center and spread of the data. Are the usual notions of center and spread useful for these​ data?
(B) Describe the shape of the histogram.
(C) If you were only shown the​ boxplot, would you be able to identify the shape of the distribution of these​ data?
(D) Can you think of an explanation for the shape of the​ histogram?
(A) The median is about ​\$14,000 and the interquartile range is about ​\$18,000.
(B) The shape of the histogram is bimodal.
(C) No
(D) The first mode represents public schools and the second mode represents private schools.
The accompanying data table gives the car type and rated horsepower for 30 different models of car available for purchase during a recent year.
(B) Which statement below best describes the​ histogram?
(C) Which statement below best interprets the​ histogram?
(D) What does the histogram tell you that the boxplot does​ not? Select all the choices below that apply.
(E) What does the boxplot tell you that the histogram does​ not? Select all the choices below that apply.
(B) The histogram is right skewed.
(C) One model of car has much higher horsepower.
(D) The shape of the distribution
(E) The presence of any outliers, The location of the middle half of the values, The location of the median
Outliers have a more dramatic effect on smaller data sets. For this​ example, the accompanying data consist of the sizes​ (in seconds and​ MB) of the 27 songs on a​ band's greatest hits album.
(B) Identify any outliers.
(C) What is the effect of excluding this song on the mean and median of the sizes of the​ songs?
(D) Which​ summary, the mean or​ median, is the better summary of the center of the distribution of​ sizes?
(E) Which​ summary, the mean or the​ median, is the more useful summary if you want to know if you can fit this album on your mp3​ player?
(B) The size of the song that is the outlier is
6.88 minutes and 6.44 MB.
(C) The mean excluding any outliers is 2.58 ​MB, whereas the mean including all of the data is
2.72 MB. The median excluding any outliers is 2.41​MB, whereas the median including all of the data is
2.51MB.
(D) Median
(E) Mean
A histogram with a long left tail.
Skewed
Position of a peak in the histogram.
Mode
The boxplot shows the mean plus or minus one standard deviation of the data.
False. The boxplot shows the​ median, with the lower edge at the 25th percentile point and the upper edge at the 75th percentile point.
If the standard deviation of a variable is​ 0, then the mean is equal to the median.
True
Which has larger standard​ deviation, the distribution of weekly allowances to​ 12-year-olds or the distribution of monthly household mortgage​ payments? Would the same distribution also have the larger coefficient of​ variation?
(A) Which distribution has larger standard​ deviation?
(B) Would the same distribution also have the larger coefficient of​ variation?
(A) The distribution of monthly household mortgage payments
(B) It is impossible to tell with the given information.
The number of standard deviations from the mean.
Z Score
If the median size used by 550 songs is 3.2 ​MB, will these all fit on a device that has 2 GB of​ storage? Can you​ tell?
From the given​ information, it is not possible to determine if these songs will fit on the device.
An environmental act mandates that gasoline sold in a certain country must average at least 2.71% ethanol by the following year. Does this mean that every gallon of gas sold has to include​ ethanol?
Every gallon of gasoline sold does not have to include ethanol. The calculation of an average does not require that every gallon of gasoline contain ethanol.
An adjustable rate mortgage allows the rate of interest to fluctuate over the term of the​ loan, depending on economic conditions. A fixed rate mortgage holds the rate of interest constant. Which sequence of monthly payments has smaller​ variance, those on an adjustable rate mortgage or those on a fixed rate​ mortgage?
Since the variance for the fixed rate mortgage is equal to zero and the variance for the adjustable rate mortgage is greater than or equal to​ zero, the sequence of monthly payments for the fixed rate mortgage has smaller variance.
Wider bins produce a histogram with fewer modes than would be found in a histogram with very narrow bins.
True
The mean is
the balance point of the histogram.
The standard deviation is
a measure of the spread of the histogram.
When the mean is greater than the​ median, the distribution is skewed left.
False. When the mean is greater than the​ median, the distribution is skewed right.
Table of​ cross-classified counts
contingency table
Measure of association between two categorical variables that grows with increased sample size
​chi-squared
Conditional distribution matches marginal distribution
not associated
Produced by a variable lurking behind a table
We can fill in the cells of the contingency table from the marginal counts alone if the two categorical variables are not associated.
The statement is true.
The percentages of cases in the first column within each row of a contingency table are the same if the variables are not associated.
True
The value of​ chi-squared depends on the number of observations in a contingency table
True
The value of​ chi-squared depends on which variable defines the rows and which defines the columns of the contingency table
This statement is false. The value of​ chi-squared is the same in either case because the row and column definitions are arbitrary.
If the categorical variable that identifies the supervising manager is associated with the categorical variable that indicates a problem with processing​ orders, then the manager is causing the problems.
False. Due to the possible presence of a lurking​ variable, association cannot be interpreted as causation.
If the percentage of female job candidates who are hired is larger than the percentage of male candidates who are​ hired, then there is association between the categorical variables Sex​ (male, female) and Hire​ (yes, no).
True
The accompanying table summarizes the status of 1000 loans made by a bank. Each loan either ended in default or was repaid. Loans were divided into large​ (more than​ \$50,000) or small size.
(A) Determine what it would mean to find association between the variables for loan size and payment status.
(B) Based on the given​ table, is there association between the​ variables?
(A) Large and small loans have different chances of being repaid.
(B) Yes, because the payment statuses among large and small loans are not approximately the same.
Numerous epidemiological studies associate a history of smoking with the presence of lung cancer. If a study finds association​ (cancer rates are higher among​ smokers), does this mean that smoking causes​ cancer?
This association does not mean that smoking causes cancer because association is not the same as causation.
Shown in the bar chart of a categorical variable
marginal distribution
A study of purchases at a​ 24-hour supermarket recorded two categorical​ variables: the time of the purchase​ (8 A.M to 8 P.M vs. late​ night) and whether the purchase was made by someone with children present. Would you expect these variables to be​ associated?
Yes. Fewer shoppers with children present would be expected during late night.