Study sets, textbooks, questions
Upgrade to remove ads
Data Mining Final
Terms in this set (106)
· ____________________ is based on a theorem of posterior probability and assumes class conditional independence.
o Naïve Bayesian classification
· When comparing classifiers, ______________________ refers to the ability to construct the classifier efficiently given large amounts of data.
· To increase classifier accuracy, the ________________ method randomly partitions the data into two independent sets, a training set and a test set.
· When comparing classifiers, ______________________ refers to the computational costs involved in generating and using the given classifier.
· _________________________ developed the ID3 decision tree algorithm.
· ____________________ is a top-down recursive induction algorithm, which uses an attribute selection measure to select the attribute tested for each nonleaf node in the tree.
o Decision tree induction
· To increase classifier accuracy, the ________________ method randomly partitions the data into k mutually exclusive subsets, where each subset will be used 1 time for testing and k-1 times for training.
o K-fold cross validation
· _______________________ occurs during learning when the classifier incorporates some particular anomalies of the training data that are not present in the general data overall.
· ________________________ presumes that the attributes' values are conditionally independent of one another, given the class label of the tuple.
o Class-conditional independence
· When comparing classifiers, ______________________ is the ability of the classifier to make correct predictions given noisy data or data with missing values.
· ____________________ algorithms attempt to improve decision tree induction accuracy by removing tree branches reflecting noise in the data.
o Tree Pruning
· ____________________, also known as unsupervised learning, analyzes data objects without consulting class labels.
· __________ = TP + TN/ P + N
· The _________________ is where test data are used to estimate the accuracy of the classification rules.
o Classification Step
Sets with similar terms
BI Review, Chapter 4
Other sets by this creator
Database Admin Final
Business Data Management Final
Incident Response and Recovery Final
Database Admin Midterm