Data Mining Notes
Terms in this set (17)
What is data mining?
using mathematical techniques to mine knowledge from data in order to make decisions.
Explain the data mining formula: f(X) = Y
f = model
X= input / features / covariates
Y = output / target / prediction / decision
X indicates information we....
already know
Y indicates information we...
want to predict
Most of the time, a data-driven predictive problem can be decomposed to a set of .......?
subtasks which fall into a set of well-defined common problems.
There are only a handful of fundamentally different types of problems. (T/F)?
true
What are the 5 types of problems we need to know for Data Mining?
1. classification
2. regression
3. clustering
4. co-occurrence grouping
5. data reduction
Explain classification.
Given an input X, predict a categorical class label for the target variable Y.
-Will the customer accept the coupon? (Label: accept/not accept)
-What is the Jack's final grade in math? (Label: A, B, C, D)
-What is the gender of this customer ? (Label: Female/Male)
These are examples of what type of problem?
Classification
In terms of the mathematical representation of classification, what is the Goal of Classification?
find a decision boundary (represented by a model) that separates one class (e.g., Churn = 1) from the other (Churn = 0)
Explain "Regression".
-value estimation
-given an input X, predict a numerical value for the target variable Y.
-What is the amount of data the user uses in August?
-How many votes does Jack get?
-What will be the sold price of the house?
What types of problems are these?
Regression
Explain clustering.
to group individuals together by their similarity, so that individuals in the same group (cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).
Examples of clusters
Cluster 1: retired. senior, active during the day
Cluster 2: working professionals, young, active at night
Co-occurrence grouping is also known as....? (3)
-frequent items missing
-association rule discovery
-market-basket analysis
Co-occurrence grouping:
To find ________________ between entities based on transactions involving them
associations
Explain data reduction.
To replace a large set of data with a smaller set of data that contains much of the important information in the large set.
-Usually involves loss of information; trade-off
