Predictive Analytics_Exam I
Terms in this set (34)
Algorithm
Specific procedure used to implement a predictive analytic technique (e.g. cluster analysis, regression)
Case/Record/Observations
One complete set of variable values; considered to be the unit of analysis
Confidence
Conditional probability of an outcome will be realized IF others are realized.
Model
Global summary of a data set on a description of relationships between or among variables; allows you to make statements about an point in the observation space.
Holdout Sample/Validation Set
Sample of data not used in fitting a model; but used to test the performance of the model. "Stress testing" to SELECT 'BEST' MODEL
Pattern
Makes statements only about restricted regions of the observation space.
Score
Refers to the predicted values or class or outcome
Supervised Learning
Process by which an algorithm 'learns' how to predict values for new cases based on known output values
Unsupervised Learning
Analysis done to learn something about data other than to predict outcome variables (e.g. clustering)
Training Data
Portion of data used to fit/develop a model - CREATES model CANDIDATES
Test Data
Set of data used as an unbiased assessment in measuring the accuracy of predictions
Target
Outcome of interest. "What do I want to predict?"
Classification
Basic form of data analysis used to arrange observations in classes.
WITHIN group VARIANCE = LOW
BETWEEN group VARIANCE = HIGH
Prediction/Estimation
(Similar to classification) Except the aim here is to predict the value of a numerical value.
Score Data
New data for which the target value will be predicted
Affinity Analysis ('Association Rules)
Rules developed and used to relate observations
Predictive Analytics/Technical Descriptions
Combined use of classification, prediction and affinity analysis
Data Exploration
Examine and inferring ideas from the data
Data Visualization
Data exploration by way of graphical analysis
Data Reduction
Consolidates a large number of variables into a smaller set (e.g. PCA or Neural Nets)
Ordinal Scale
Only (numerical) order matters
Interval Scale
Difference between values is meaningful. There is no ABSOLUTE 0 (i.e. a '0' value does not mean there is none of that variable)
Ratio Scale
Similar to interval, but can have absolute 0 value (e.g. Kelvin temp. scale)
Standardizing/Normalizing Data
Used when variables with the largest scales would dominate and skew results.
Line Graph
Used with time-series data, which are values that vary within time intervals
Bar Chart
Compares a single stat across groups - height of bar corresponds to focal stat
Scatter Plot
Study Association between numerical variables' values
Box Plot
Useful for comparing subgroups and seeing distribution overtime side-by-side
Histogram
Useful to indicate where transformations are required due to skewness in outcome variables ("Frequency Chart")
Correlation
Measure strength of linear relationship between two variables
Principal Component Analysis (PCA)
Procedure for transforming correlated variables to a linear combination of uncorrelated (independent) variables. Removes overlap of information
Neural Networks
Procedure for identifying a function that relates variables to each other
Variable Selection
Procedure related to dimension reduction. Select/reject variables based on predefined criteria.
Transformation
Part of dimension reduction:
1) Create a new variable by changing the form of the given variable;
2) Replacing the observed set of variables with a smaller set or combination of variables
