Search
Create
Log in
Sign up
Log in
Sign up
Predictive Analytics
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (34)
What is predictive analytics?
Technology that learns from experience (data) to predict future behavior in order to drive better decisions
What does data mining use to make predictions?
Patterns
What are 4 examples of Predictive Analytics
Classification
Clustering
Association
Outlier detection
What is a dependent variable called in Predictive Analytics?
Output / Target Variable
Response
What is a statistical procedure or forecasting model called in Predictive Analytics?
algorithm
What is an explanatory variable called in Predictive Analytics?
attribute / feature
What is an observation called in Predictive Analytics?
Record
What is the act of forecasting/predicting called in Predictive Analytics?
Scoring
What is Supervised Learning?
Providing an algorithm with records in which a target is known
There is a specified target
What is Unsupervised learning?
Learning something about the data without an output of interest
There is no specific target
What is Training Data?
Used to Fit the Model Parameters
What is Validation Data
Used to assess the Model Fit
What is Test Data
Used only at the end of the data to assess performance
Required for some government work
What is SEMMA
Sample
Explore
Modify
Model
Assess
What is an independent variable called in Predictive Analytics?
Predictor
What is a success class?
The class of interest in a binary outcome (aka the purchasers if you're comparing purchase/no purchase)
What is a Lift Chart
Our ability to predict beyond the Naive Model
AKA a Gains Chart
What is a Linear Classifier?
Classifying based on one-side or another of a line demarcating the data
Always one less line than data sets - consider the 3 Iris types with two lines
What is K-Fold Cross Validation
A Resampling Method that splits the data into k-folds and repeats the process K times.
If 5-fold with 100 data points, create 5 folds of 20 data points and repeat 5 times.
In practice, 5 or 10 is what works.
What are the 4 Big Data Characteristics?
Volume
Velocity
Variety
Value
What is predictive analytics NOT?
Database management
What are the 4 ways we compare classification algorithms?
Predictive Accuracy
Speed and Scalability
Robustness
Interpretability
Cross-Validation is similar to....
the holdout method
What is the Misclassification Rate?
Incorrect Classifications / All Records
What is the best measure of accuracy for Predictive Analytics?
Misclassification Rate
Correct Classifications are on what diagonal in the Confusion Matrix?
Upper Left to Bottom Right
What are the two facets of Speed/Scalability?
Time to Construct
Time to Use
What are the 4 considerations for Robustness?
Aka, how does the model handle...
Noise
Missing Values
Irrelevant Features
Streaming data
What does Interpretability mean?
The learned classifier tells us something about the domain it is in.
What is the confusion matrix called in IBM/SPSS
Coincidence Matrix
What is the Naive Model for Predictive Analytics?
Classifying everything as belonging to the most prevalent class
How is a partition created?
Numbers are assigned a number
A Number Generator (which is quasi-random as it uses a seed number) determines which rows go into training v. validation
What are 4 beginning mistakes of Predictive Analytics?
Jump Right In
Think Big
Do it yourself
Fall in love
What are the top 4 Applications for Predictive Analytics according to SAS
Cross-sell/up-sell
Campaign Management
Customer Acquisition
Budgeting/Forecasting
;