Study sets, textbooks, questions
Upgrade to remove ads
Data Mining Notes
Terms in this set (17)
What is data mining?
using mathematical techniques to mine knowledge from data in order to make decisions.
Explain the data mining formula: f(X) = Y
f = model
X= input / features / covariates
Y = output / target / prediction / decision
X indicates information we....
Y indicates information we...
want to predict
Most of the time, a data-driven predictive problem can be decomposed to a set of .......?
subtasks which fall into a set of well-defined common problems.
There are only a handful of fundamentally different types of problems. (T/F)?
What are the 5 types of problems we need to know for Data Mining?
4. co-occurrence grouping
5. data reduction
Given an input X, predict a categorical class label for the target variable Y.
-Will the customer accept the coupon? (Label: accept/not accept)
-What is the Jack's final grade in math? (Label: A, B, C, D)
-What is the gender of this customer ? (Label: Female/Male)
These are examples of what type of problem?
In terms of the mathematical representation of classification, what is the Goal of Classification?
find a decision boundary (represented by a model) that separates one class (e.g., Churn = 1) from the other (Churn = 0)
-given an input X, predict a numerical value for the target variable Y.
-What is the amount of data the user uses in August?
-How many votes does Jack get?
-What will be the sold price of the house?
What types of problems are these?
to group individuals together by their similarity, so that individuals in the same group (cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).
Examples of clusters
Cluster 1: retired. senior, active during the day
Cluster 2: working professionals, young, active at night
Co-occurrence grouping is also known as....? (3)
-frequent items missing
-association rule discovery
To find ________________ between entities based on transactions involving them
Explain data reduction.
To replace a large set of data with a smaller set of data that contains much of the important information in the large set.
-Usually involves loss of information; trade-off
Other sets by this creator
CRIS - Property - Ch. 12 - Glossary
CRIS - Property - Ch. 11 - Other Policies: Commerc…
CRIS - Property - Ch. 10 - Contractors Equipment:…
CRIS - Property - Ch. 9 - Contractors Equipment: C…
Other Quizlet sets
Care Exam 4: Care Management for Rest and Sleep
Small Business Key Terms
pharm ***** :(