Search
Browse
Create
Log in
Sign up
Log in
Sign up
Upgrade to remove ads
Only $2.99/month
Stats Final
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Key Concepts:
Terms in this set (60)
Which of the following reasons is responsible for the increase in the use of data-mining tech- niques in business?
a. The lack of methods to electronically track data
b. The dearth of information to analyze and interpret
c. The ability to electronically warehouse data
d. The ability to manually analyze all the data
c. The ability to electronically warehouse data
Observation refers to the:
a. estimated continuous outcome variable.
b. set of recorded values of variables associated with a single entity.
c. goal of predicting a categorical outcome based on a set of variables.
d. mean of all variable values associated with one particular entity
b. set of recorded values of variables associated with a single entity.
_____ is a category of data-mining techniques that detect patterns and relationships in the data.
a. Descriptive data-mining
b. Predictive data-mining
c. Machine learning
d. Artificial intelligence
a. Descriptive data-mining
The data-mining method that can be used in market segmentation to divide consumers into differ- ent homogeneous groups is _____.
a. data visualization
b. cluster analysis
c. market analysis
d. supervised learning
b. cluster analysis
Which of the following is true of bottom-up hierarchical clustering?
a. All observations are put in a mega-cluster to begin with.
b. Each of the large clusters is broken down iteratively.
c. It starts with each observation in its own cluster and then iteratively combine two most simi-
lar clusters
d. At the end of the process, observations in the same cluster have maximum distance.
c. It starts with each observation in its own cluster and then iteratively combine two most simi-
lar clusters
The k-means clustering is the process of
a. agglomerating observations into a series of nested groups based on a measure of similarity.
b. organizing observations into one of k groups based on a measure of similarity.
c. reducing the number of variables to consider in a data-mining approach.
d. estimating the value of a continuous outcome variable.
b. organizing observations into one of k groups based on a measure of similarity.
The simplest measure of similarity between observations consisting solely of categorical varia- bles is given by _____.
a. the Euclidean distance
b. the standardized Euclidean distance
c. matching coefficient
d. Jaccard's coefficient
c. matching coefficient
Jaccard's coefficient is different from the matching coefficient in that the former:
a. measures overlap while the latter measures dissimilarity.
b. does not count matching zero entries while the latter does.
c. deals with categorical variable while the latter deals with continuous variables.
d. is affected by the scale used to measure variables while the latter is not.
b. does not count matching zero entries while the latter does.
Single linkage measures dissimilarity between two clusters by considering:
a. the two most distant observations in these clusters.
b. the average dissimilarity over all pairs of observations between these clusters.
c. only the two closest observations in these clusters.
d. the distance between the cluster centroids.
c. only the two closest observations in these clusters.
_____ measures dissimilarity between two clusters by considering only the two most distant ob- servations in these clusters.
a. Single linkage
b. Complete linkage
c. Centroid linkage
d. Average group linkage
b. Complete linkage
Average group linkage measures dissimilarity between two clusters by considering:
a. only the two most dissimilar observations in these clusters.
b. the average distance over all pairs of observations between these clusters.
c. only the two closest observations in these clusters.
d. the distance between the cluster centroids.
b. the average distance over all pairs of observations between these clusters.
_____ measures dissimilarity between two clusters by using the distance between the two cluster centroids.
a. Single linkage
b. Complete linkage
c. Group average linkage
d. Centroid linkage
d. Centroid linkage
_____ is the vector of the averages computed for each variable across all cluster observations.
a. Euclidean distance
b. Matching coefficient
c. Jaccard's coefficient
d. Centroid
d. Centroid
A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical cluster- ing is known as a _____.
a. dendrogram
b. scatter chart
c. decision tree
d. box-plot
a. dendrogram
If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations?
a. the average of the sum of both legs
b. the hypotenuse
c. the small leg
d. the long leg
b. the hypotenuse
The endpoint of a k-means clustering algorithm occurs when:
a. Euclidean distance between clusters is minimum.
b. Euclidean distance between observations in a cluster is maximum.
c. no further changes are observed in cluster structure and number.
d. all of the observations are encompassed within a single large cluster with mean k.
c. no further changes are observed in cluster structure and number.
An analysis of items frequently co-occurring in transactions (such as purchases) is known as _____.
a. market segmentation
b. market basket analysis
c. regression analysis
d. cluster analysis
b. market basket analysis
A _____ refers to the number of times that a collection of items occur together in a transaction data set.
a. test set
b. validation count
c. support count
d. training set
c. support count
In the theory of association rules in data mining, by confidence we mean an estimated probability that
a. the antecedent and consequent occur
b. the antecedent occurs given that the consequent occurs
c. the consequent occurs given that the antecedent occurs
d. the antecedent or the consequent occur
c. the consequent occurs given that the antecedent occurs
The lift ratio of an association rule with a confidence value of 0.88 and in which the consequent occurs in 60 out of 100 cases is:
a. 1.30
b. 0.54
c. 1.00
d. 1.47
d. 1.47
A group of observations measured at successive time intervals is known as
a. a random variable
b. a time series
c. a forecast
d. a cross-sectional data
b. a time series
The time series pattern which reflects a multi-year pattern of being above and below the trend line is
a. a trend
b. seasonal
c. cyclical
d. irregular
c. cyclical
The time series pattern that reflects variability during a single year is called
a. a trend
b. seasonal
c. cyclical
d. irregular
b. seasonal
The time series pattern that reflects gradual increase or decrease in values over a long time period is called
a. a trend
b. seasonal
c. cyclical
d. irregular
a. a trend
The pattern of a time series in business forecasting that is most difficult to predict is
a. trend and seasonal pattern b. seasonal pattern
c. trend pattern
d. cyclical pattern
d. cyclical pattern
If data for a time series analysis is collected on an annual basis only, which pattern may be ignored?
a. trend
b. seasonal
c. cyclical
d. irregular
b. seasonal
Which of the following data patterns best describes the scenario shown in the below plot?
a. Time series with a linear trend pattern
b. Time series with a nonlinear trend pattern
c. Time series with no pattern
d. Time series with a horizontal pattern
d. Time series with a horizontal pattern
Which of the following data patterns best describes the scenario shown in the given quarterly time series plot?
a. Linear trend pattern
b.Logarithmic trend
c. Exponential trend
d.Seasonal pattern
d.Seasonal pattern
A method that uses a weighted average of all past values is known as
a. a smoothing average
b. a moving average
c. an exponential average
d. an exponential smoothing
d. an exponential smoothing
One measure of the accuracy of a forecasting model is
a. the smoothing constant
b. a deseasonalized time series
c. the mean square error
d. the seasonal index
c. the mean square error
For a time series with 17 time periods, the following linear trend expression was estimated: 𝑦̂𝑡 = 129.2 + 3.8t. The forecast for time period 18 is
a. 68.4
b. 193.8
c. 197.6
d. 6.84
c. 197.6
For a quarterly time series over the last 4 years, the following linear trend expression was estimated: 𝑦̂𝑡 = 120 + 2t. The forecast for the second quarter of Year 5 is
a. 124
b. 156
c. 154
d. 160
b. 156
Which of the following statements is the objective of the moving averages and exponential smoothing methods?
a. To characterize the variable fluctuations by a smooth curve
b. To smooth out random fluctuations in the time series
c. To characterize the variable fluctuations by an exponential equation
d. To transform a nonstationary time series into a stationary series
b. To smooth out random fluctuations in the time series
A monthly time series has a seasonal pattern. If a linear regression model is used, how many dummy vari- ables must be used to represent this seasonality?
a. 10
b. 11
c. 12
d. 13
b. 11
_____ is the process of estimating the value of a categorical outcome variable. a. Sampling
b. Prediction
c. Classification
d. Validation
c. Classification
In classification, which of the following would be considered as a categorical variable for a credit approval decision for a requester?
a. marital status of the requester
b. reject or accept credit approval
c. income of the requester
d. gender of the requester
b. reject or accept credit approval
The effectiveness of a classification method can be judged by computing the misclassification er- rors and summarizing them in a
a. pivot table
b. payoff table
c. dendrogram
d. confusion matrix
d. confusion matrix
Test set is the data set used to:
a. build the data mining model.
b. estimate accuracy of candidate models on unseen data.
c. estimate accuracy of final model on unseen data.
d. show counts of actual versus predicted class values.
c. estimate accuracy of final model on unseen data.
An observation is classified as Class 1 if
a. the predicted probability of this observation to be in Class 1 is less than the cutoff value
b. the predicted probability of this observation to be in Class 1 is greater than or equal to the
cutoff value
c. the allowable probability of making Class 1 error is less than the test p-value
d. the allowable probability of making Class 1 error is greater than or equal to the test p-value.
b. the predicted probability of this observation to be in Class 1 is greater than or equal to the
cutoff value
In the k-nearest neighbors method, when the value of k is set to 1,
a. the classification or prediction of a new observation is based solely on the single most similar
observation from the training set.
b. the new observation's class is naïvely assigned to the most common class in the training set.
c. the new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set.
d. the classification or prediction of a new observation is subject to the smallest possible classi- fication error.
a. the classification or prediction of a new observation is based solely on the single most similar
observation from the training set.
_____ is a generalization of linear regression for predicting an outcome of a binary variable.
a. Multiple linear regression
b. Logistic regression
c. The k-nearest neighbors method
d. Cluster analysis
b. Logistic regression
Spreadsheet models are referred to as what-if models because they
a. are mathematical and logic-based models.
b. allow easy instantaneous recalculation for a change in model inputs.
c. come preloaded on computers.
d. have specialized functions to perform detailed analysis.
b. allow easy instantaneous recalculation for a change in model inputs
In ____ decision making companies have to decide whether they should manufacture a product or outsource its production to another firm
a. goal seek
b. two-way
c. voting-based
d. make-versus-buy
d. make-versus-buy
The modeling process begins with the framing of the _____ that shows the relationships between the various parts of the problem being modeled.
a. mathematical model
b. conceptual model
c. circular model
d. correlation model
b. conceptual model
A(n) _____ is a visual representation that shows which entities influence others in a model.
a. decision tree diagram
b. influence diagram
c. entity chart
d. time series plot
b. influence diagram
Which of the following approaches is a good way to proceed with the influence diagram building for a problem?
a. The influence diagram for the entire problem is build first and then separate portions are clus- tered to form separate models.
b. The influence diagram for all the model parts at the same level are built in parallel to reduce the likelihood of error.
c. The influence diagram is reverse engineered -the diagram is developed in the opposite direc- tion starting with the model output.
d. The influence diagram for a portion of the problem is build first and then expanded until the total problem is conceptually modeled.
d. The influence diagram for a portion of the problem is build first and then expanded until the total problem is conceptually modeled.
With reference to a what-if model, an uncontrollable model input is known as a(n) _____.
a. decision variable
b. dummy variable
c. parameter
d. outlier
c. parameter
A(n) _____ refers to a model input that the decision maker can control in a what-if model.
a. decision variable
b. outlier
c. parameter
d. dummy variable
a. decision variable
A one-way data table summarizes:
a. a single input's impact on the output of interest.
b. multiple input's impact on a single output of interest.
c. values of the input cells that will cause the single output value to equal zero. d. values of cells when not all of the model is observable on the screen.
a. a single input's impact on the output of interest.
The impact of two inputs on the output of interest can be examined by a _____.
a. Goal Seek
b. Watch Window
c. one-way data table
d. two-way data table
d. two-way data table
The _____ function pairs each element of the first array with its counterpart in the second array, multiplies the elements of the pairs together, and adds the results.
a. SUM
b. SUMPRODUCT
c. SUMIF
d. IF
b. SUMPRODUCT
The arguments supplied to the IF function are:
a. the condition for execution, the result if condition is true, and the result if condition is false.
b. the range of cells and the condition for execution.
c. the array1 of data cells, the array2 of data cells, and the condition for execution.
d. the condition for execution only.
a. the condition for execution, the result if condition is true, and the result if condition is false.
Within a given range of cells, the number of times a particular condition is satisfied is computed by using the _____ function.
a. SUMIF
b. IF
c. VLOOKUP
d. COUNTIF
d. COUNTIF
The uncontrollable future events that affect the outcome of a decision are known as
a. alternatives
b. decision outcomes
c. payoffs
d. states of nature
d. states of nature
A tabular representation of gains or losses for a decision problem is called a
a. decision tree
b. payoff table
c. sequential matrix
d. probability table
b. payoff table
A graphic presentation of the expected gain from the various options open to the decision maker is called
a. a payoff table
b. a decision tree
c. the expected opportunity loss
d. the expected value of perfect information
b. a decision tree
Nodes of a decision tree indicating points where a decision maker chooses a decision alternative are known as
a. decision nodes
b. chance nodes
c. marginal nodes
d. conditional nodes
a. decision nodes
Nodes of a decision tree indicating points where the outcomes do not depend on a decision maker are known as
a. decision nodes
b. chance nodes
c. marginal nodes
d. conditional nodes
b. chance nodes
Prior probabilities are the probabilities of the states of nature that are estimated
a. after obtaining sample information
b. before obtaining perfect information
c. before obtaining sample information
d. after obtaining perfect information
c. before obtaining sample information
The probabilities of states of nature after revising the prior probabilities based on given sample information are called
a. the expected probabilities
b. the posterior probabilities
c. the prior probabilities
d. the unconditional probabilities
b. the posterior probabilities
YOU MIGHT ALSO LIKE...
BUAD 2070 final
51 terms
Business Statistics Final Exam
51 terms
451: chapter 8, 9, 4 (exam 2)
55 terms
Business Analytics Final
89 terms
OTHER SETS BY THIS CREATOR
BUAD 3020
26 terms
b
10 terms
Midterm
41 terms
Accounting 3- Chapter 13
7 terms
OTHER QUIZLET SETS
3.01 - 3.03
15 terms
Driver's Ed
71 terms
Chapter 17: Abdomen
76 terms
What is geography?
14 terms