Home
Subjects
Create
Search
Log in
Sign up
Upgrade to remove ads
Only $2.99/month
Home
Science
Computer Science
Artificial Intelligence
Data mining quiz 1
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (53)
6 steps to the data mining process
1 Data understanding
2. Business understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment
Label as Data mining (DM) or Results of data mining (USE): Choose customers who are most likely to respond to an online ad
USE
Label as Data mining (DM) or Results of data mining (USE): Estimate probability of default for a credit application
USE
Label as Data mining (DM) or Results of data mining (USE): Find patterns indicating what customer behavior is more likely to lead an online ad
DM
Label as Data mining (DM) or Results of data mining (USE): Discover rules that indicate when an account has been defrauded
DM
Define business understanding
Understanding or creating a business problem
Define data understanding
Understanding the strengths and limitations of the data
Define data preparation
Converting the data into the right format
Define modeling
Applying data mining techniques to the data
Define evaluation
Assessing the performance of the data mining results
Define deployment
Put into real use to realize ROI
What is a simplified representation of reality crafted to serve a purpose
Model
What is: Estimating an unknown value (target)
Prediction
What is: A formula for estimating the unknown value of interest (target). The formula can be a mathematical or logical statement (rule)
Predictive model
What represents a fact or data point.
Described by a set of attributes (fields, columns, variables, or features)
Feature vector (collection of feature values)
Instance/example
Training data is
The input data for the induction algorithm
Validation data is
The data for model evaluation
What are the 2 feature types
Numeric and Categorical
Examples of numeric data
Numbers
Dates
Dimension of 1
Examples of categorical data
Binary
Text
What is the dimensionality of a dataset
The sum of the dimesions of the features
True or false: When conducting supervised data mining the value of the attributes except for the target variable are known when the model is used
True
True or false: Choosing which customers are most likely to leave is an example of the use of DM results
True
True or false: Finding the characteristics that differentiate my most profitable customers from my less profitable customers is an example of an unsupervised learning task
False
Predictive model of customer churn steps
1. Define churn
2. Test set/ Training set
3/4. Model
5. Model performance
6. Unknown data
7. Predictions
8. Retention or no campaign
What is the result of supervised data mining
A model that predicts some quantity
Subclasses of supervised data mining:
Classification- Categorical
Regression- Numerical
9 Common data mining tasks
1. Classification and class probability estimation
2. Regression
3. Similarity Matching
4. Clustering
5. Co-occurrence grouping
6. Profiling
7. Data Reduction
8. Link Prediction
9 Casual Modeling
What are the 3 unsupervised common data mining tasks
Clustering
Co-occurrence grouping
Profiling
What are the 3 supervised common data mining tasks
Classification
Regression
Casual modeling
What are the 3 unsupervised and supervised common data mining tasks
Similarity matching
Link prediction
Data Reduction
When should we stop the classification tree from growing
1. Nodes are pure
2. No more variables
3. Information gain from adding an extra variable/node is less than a set threshold level
4. All of the above
All of the above
True or false: Entropy goes from high to low from the root to the branches
Tru
Tree Structured Model Rules
1. No 2 parents share descendants
2. There are no cycles
3. The branches always point downwards
4 Every example ends up at leaf node with some specific class determination
How do we create a classification tree from data? (2)
1. Divide and conquer approach
2. Take each data subset and RECURSIVELY apply attribute selection to find the best attribute to partition it
What is the point of recursive partitioning
Get smaller and smaller rectangular regions
Divide the entire x space into rectangles so each is as HOMOGENOUS OR PURE as possible
What do we mean by 'pure'
Containing points that belong to just one class
Splitting is done to reduce what?
the impurities
Information gain
a. Measures the change in entropy due to any amount of new information being added
b. is only used to calculate entropy
c) is a measure of correlation between numeric variables
d) is prone to over-fitting
a
How can we segment the population into groups that differ from each other with respect to some quantity of interest?
Informative/knowable attributes that correlate with the target variable
Whats the most important splitting criterion
IG information gain
What is IG based on
a Purity measure called ENTROPY
Entropy is 0 at ____________ disorder
minimal
The set same members all with the same single property
Entropy is 1 at _________________ disorder
Maximal
Equally mixed properties
The higher the entropy the ____________ information content
More
What is the entropy of a group in which all examples belong to the same class
-entropy = -1log1 = 0
NOT GOOD TRAINING SET FOR LEARNING
What is the entropy of a group with 50% in either class
-entropy = -5log.5 - .5 = 1
GOOD TRAINING SET FOR LEARNING
.99 value is pure or impure
very impure
.62 value is impure or pure
less impure
Bar corresponds to
the entropy of one of the attributes values
The width of the bar corresponds to
the prevalence of that value in the data
If we select the SINGLE variable that gives the most information gain, we create what?
Very simple segmentation
True or false: Information gain tells us how important a given attribute of the feature vector is
TRUE
THIS SET IS OFTEN IN FOLDERS WITH...
FINAN 303
31 terms
Project Management Chapter 12
29 terms
PM Chapter 2
80 terms
Financial Management FIN 5130
5 terms
YOU MIGHT ALSO LIKE...
CIS 375 Quiz 1
96 terms
Cis 4640: Exam # 1 Review
36 terms
MA 322 MIDTERM
44 terms
BI Review, Chapter 4
32 terms
OTHER SETS BY THIS CREATOR
summary liebig
18 terms
summary gen final
66 terms
gen exam 4 part 2
72 terms
gen exam 4
102 terms
OTHER QUIZLET SETS
Segmentation
15 terms
Ch 4 - AIS
14 terms
Fall 2016 BUS 311 Final Exam
42 terms
it 244 quiz 4
15 terms