Terms in this set (34)
What is predictive analytics?
Technology that learns from experience (data) to predict future behavior in order to drive better decisions
What does data mining use to make predictions?
What are 4 examples of Predictive Analytics
What is a dependent variable called in Predictive Analytics?
Output / Target Variable
What is a statistical procedure or forecasting model called in Predictive Analytics?
What is an explanatory variable called in Predictive Analytics?
attribute / feature
What is an observation called in Predictive Analytics?
What is the act of forecasting/predicting called in Predictive Analytics?
What is Supervised Learning?
Providing an algorithm with records in which a target is known
There is a specified target
What is Unsupervised learning?
Learning something about the data without an output of interest
There is no specific target
What is Training Data?
Used to Fit the Model Parameters
What is Validation Data
Used to assess the Model Fit
What is Test Data
Used only at the end of the data to assess performance
Required for some government work
What is SEMMA
What is an independent variable called in Predictive Analytics?
What is a success class?
The class of interest in a binary outcome (aka the purchasers if you're comparing purchase/no purchase)
What is a Lift Chart
Our ability to predict beyond the Naive Model
AKA a Gains Chart
What is a Linear Classifier?
Classifying based on one-side or another of a line demarcating the data
Always one less line than data sets - consider the 3 Iris types with two lines
What is K-Fold Cross Validation
A Resampling Method that splits the data into k-folds and repeats the process K times.
If 5-fold with 100 data points, create 5 folds of 20 data points and repeat 5 times.
In practice, 5 or 10 is what works.
What are the 4 Big Data Characteristics?
What is predictive analytics NOT?
What are the 4 ways we compare classification algorithms?
Speed and Scalability
Cross-Validation is similar to....
the holdout method
What is the Misclassification Rate?
Incorrect Classifications / All Records
What is the best measure of accuracy for Predictive Analytics?
Correct Classifications are on what diagonal in the Confusion Matrix?
Upper Left to Bottom Right
What are the two facets of Speed/Scalability?
Time to Construct
Time to Use
What are the 4 considerations for Robustness?
Aka, how does the model handle...
What does Interpretability mean?
The learned classifier tells us something about the domain it is in.
What is the confusion matrix called in IBM/SPSS
What is the Naive Model for Predictive Analytics?
Classifying everything as belonging to the most prevalent class
How is a partition created?
Numbers are assigned a number
A Number Generator (which is quasi-random as it uses a seed number) determines which rows go into training v. validation
What are 4 beginning mistakes of Predictive Analytics?
Jump Right In
Do it yourself
Fall in love
What are the top 4 Applications for Predictive Analytics according to SAS