Home
Browse
Create
Search
Log in
Sign up
Upgrade to remove ads
Only $2.99/month
451: chapter 8, 9, 4 (exam 2)
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (55)
(time series quiz):
a positive forecast error indicates that the forecasting method ___________ the dependent variable
underestimated
(time series quiz):
autoregressive models
occur whenever all the independent variables are previous values of the time series.
(time series quiz):
the moving averages and exponential smoothing methods are appropriate for a time series exhibiting
a horizontal pattern
(time series quiz):
a causal model provides evidence of ______________ between an independent variable and the variable to be forecast
an association
(time series quiz):
what is true of stationary time series?
stationary time series = a time series whose statistical properties are independent of time; the process of generating the data has a constant mean; the variability of the time series is constant over time; will always exhibit a horizontal pattern
(time series quiz):
_____________ uses a weighted average of past time series values as the forecast
exponential smoothing
(time series quiz):
______________ is the amount by which the predicted value differs from the observed value of the time series variable
forecast error
(time series quiz):
trend refers to:
the long run shift or movement in the time series observable over several periods of time
time series
a sequence of observations on a variable measured at successive points in time or over successive periods of time
time series plot
to identify the underlying pattern in the data; a graphical representation of the relationship between time (horizontal axis) and the time series variable (vertical axis); should be one of the first analytic tools used when trying to determine which forecasting method to use
stationary time series
denote a time series whose statistical properties are independent of time; constant mean and constant variability; will always exhibit horizontal pattern
horizontal pattern
when data fluctuate randomly around a constant mean over time
trend
time series showing gradual shifts/movements to relatively higher or lower values over a longer period of time; usually a result of long-term factors
seasonal patterns
recognized by observing recurring patterns over successive periods of time
cyclical pattern
exists if the time series plot shows an alternating sequence of points below and above the trendline that lasts for more than one year
naive forecasting
a forecasting technique that uses the value of the time series from the most recent period as the forecast for the current period; most simple
trend-cycle effects
the result of long-term trend effects combined with cyclical effects
forecast error
the amount by which the forecasted value differs from the observed value
positive error
indicates that the forecasting method underestimated the actual value
negative error
indicates that the forecast overestimated the actual value
mean absolute error (MAE)
a measure of forecast accuracy that avoids the problem of positive and negative forecast errors offsetting one another; MAE = the average of the absolute values of the forecast errors
mean squared error (MSE)
a measure of the accuracy of a forecasting method; the average of the sum of the squared differences between the forecast values and the actual time series values
mean absolute percentage error (MAPE)
a measure of the accuracy of the forecasting method; the average of the absolute values of the errors as a percentage of the corresponding forecast values
moving average method
uses the average of the most recent 'k' data values in the time series as the forecast for the next period
exponential smothing
uses a weighted average of past time series values as a forecast
autoregressive models
occur whenever all the independent variables are previous values of the time series
causal models
forecasting methods that relate a time series to other variables that are believed to explain its behavior
observation (record)
the set of recorded values of variables associated with a single entity, often displayed as a row in a spreadsheet or database
variable (feature)
a characteristic or quantity of interest that can take on different values
supervised learning
category of data-mining techniques in which an algorithm learns how to predict or classify an outcome variable of interest
unsupervised learning
designed to describe patterns and relationships in large data sets with many observations of many variables
there is no outcome variable to predict; the goal is to use the variable values to identify relationships between observations
estimation
a predictive data-mining task requiring the prediction of an observations continuous outcome value
classification
a predictive data mining task requiring the prediction of an observations outcome class or category
data mining process (6 steps)
1. data sampling
2. data preparation
3. data partitioning
4. data exploration
5. model construction
6. model assessment
model overfitting
a situation in which a model explains random patterns in the data on which it is trained rather than just the relationships, resulting in training-set accuracy that far exceeds accuracy for the new data
training set
data used to build candidate predictive models
validation set
data used to evaluate candidate predictive models
test set
data set used to compute unbiased estimate of final predictive model's accuracy
classification confusion matrix
a matrix showing the counts of actual versus predicted class values
overall error rate
the percentage of observations misclassified by a model in a data set
accuracy
measure of classification success defined as 1 minus the overall error rate
false positive
the misclassification of a Class 0 observation as Class 1
false negative
the misclassification of a Class 1 observation as Class 0
cutoff value
the smallest value that the predicted probability of an observation can be for the observation to be classified as Class 1
cumulative lift chart
a chart used to prevent how well a model performs in identifying observations most likely to be in Class 1 as compared with random classification
(descriptive data mining quiz):
the lift ratio of an association rule with a confidence value of 0.45 and in which the consequent occurs in 6 out of 10 cases is:
0.75
rationale:
the lift ratio is given by confidence/(support of consequence/total number of transactions)
which of the following is true of euclidean distance?
it is commonly used as a method of measuring dissimilarity between quantitative observations
rationale:
when observations include numerical variables, euclidean distance is the most common method to measure dissimilarity between observations
(descriptive data mining quiz):
the data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called:
cluster analysis
rationale:
clustering can be employed during the data preparation step to identify variable/observations that can be aggregated or removed from consideration; divides consumers into homogeneous groups (segmentation)
(descriptive data mining quiz):
which of the following reasons contributes to the increase in the use of data-mining techniques in business?
the ability to electronically warehouse data
rationale:
increase in data-mining:
1. the explosion in amount of data being produced/electronically tracked
2. the ability to electronically warehouse these data
3. affordability of computer power to analyze the data
(descriptive data mining quiz):
in which of the following scenarios would it be appropriate to use hierarchical clustering?
when binary or ordinal data needs to be clustered
rationale:
use hierarchical clustering when there is small data set; want to easily examine solutions with increasing numbers of clusters; observe how clusters are nested
(descriptive data mining quiz):
_______________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters
complete linkage
rationale:
complete linkage is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters
(descriptive data mining quiz):
a clusters _________________ can be measured by the different between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram
durability
(descriptive data mining quiz):
____________ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables
unsupervised learned
(descriptive data mining quiz):
a tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as:
dentrogram
(descriptive data mining quiz):
an analysis of items frequently cooccurring in transactions is known as:
market basket analysis
THIS SET IS OFTEN IN FOLDERS WITH...
ISDS 361B - Ch. 8
29 terms
BUS 322 Data Analysis Final
67 terms
Chapter 1
22 terms
OPRE3333 Final Exam Review
52 terms
YOU MIGHT ALSO LIKE...
Quant
42 terms
BA 375 Midterm Chapter 3&4 - Oregon State
106 terms
Final
89 terms
Advanced inventory mgmt exam 1
36 terms