Upgrade to remove ads
Supervised Learning in R
Terms in this set (27)
What function from what class compares data from one source to another to make predictions about classification?
kNN() from the class library
What are the 3 needed arguments to the knn() function?
train - the data for learning
test - the data for testing
cl - the labels to classify the data into
e.g. result <- knn(train = signs[-1], test = next_sign, cl = sign_types)
function to build counts of factor levels in a dataframe?
What is the aggregate function
Need to learn this
What is the k in kNN()
the number of neighbors to compare
What are the pros and cons of having a high or a lower number for k
a high k means you have smaller "neighborhoods" which means you are able to capture more subtlety in the data. Conversely, it means its more susceptible to noise in the data that can mask overlying trends
What attribute do you need to increase the number of neighbors in the knn
e.g. knn(train = signs[-1], test = signs_test[-1], cl = sign_types, k=15)
If you wanted to quickly see what percentage of a column had a certain value what would you use?
For instance, you wanted to see what percent of signs had a value of k_15
What does the attr() function do?
What is it called when you use a 1 or an 0 to differentiate whether a condition exits?
binary dummy variable
What problem does knn have with scaling and what function works to fix it?
kNN needs to have both sets on the same scale in order to function (comparing a range of 4 to a range of 200 wouldn't work)
use a custom function called normalize()
The naive bayes formula means the probability of A given b is P(A|B)
The probability of A plus the probability of B. Divide the result by the probability of B
How would you calculate a basic probability? For example 2 of the 4 oranges are still green
Divide the proportion by the whole. For instance this would be 2 green oranges divided by all 4 oranges. This would give you .5 or 50%
What function gives you the number of rows in a dataset?
What function lets you select segment or cohort of a dataframe based on certain conditions?
What function from what package calculates a naive bayes formula?
naive_bayes() from the naivebayes library
What are the two main attributes for the naive_bayes() formula?
an lm function (y ~x) where y is a function, or explained by x and a declared data argument.
If you wanted to calculate the future from a naive bayes object, what formula would you use and what arguments does it require?
predict() with a naive bayes object as the first argument and an object to make predictions on as the second
What happens if you supply the type='prob' attribute to the predict() function?
It will compute the probabilities of ALL the possible outcomes as opposed of just telling you which one it likely would be
What two steps are required to make predictions via naive bayes?
1. Create a naive bayes object with the function naive_bayes()
2. use the naive bayes object in the predict function with a new set of data
In naive bayes, what is an . Independent variable?
A variable that doesn't occur at the same time or in association with another. They are problematic because they always make the results 0
Why is naive bayes naive?
Calculating the probability and understanding multiple simultaneous events is challenging. To simplify, it assumes that events occur independent of one another (which they almost never do)
How does the naive bayes model handle the issue of multiple overlapping probabilities?
It calculates the probability relationships individually then multiplies them together. For instance, the probability of A given B and C is the probability of A given B times the probability of A given C.
What is the Laplace correction or Laplace estimator?
It adds a small number to each element in an equation to ensure none of them are 0. In Naive bayes formula for multiple probabilities a zero always cancels the equation. This is a 'quick fix' to handle the issue
In naive bayes formula, how would you combine multiple predictors together?
You would add them in the initial argument.
In the naive bayes formula, how would you adjust for the laplace correction?
Its a declared argument where you would pass the value
Describe how knn() works
Its a classification algorithm meaning tt tries to categorize things based on previous data, which it calls a training set. Basically it tries to "match" data to the similaries the new data has to data of existing categories
THIS SET IS OFTEN IN FOLDERS WITH...
Datacamp: Intro to SQL and Joins with PostgreSQL
Data manipulation with dplyr
Datacamp intermediate r
String Manipulation Commands in R
YOU MIGHT ALSO LIKE...
Excel Chapter 8
Excel Chapter 8
Excel Ch. 8
Excel Chapter 8
OTHER SETS BY THIS CREATOR
Exploratory Analysis R
Correlation and Regression
OTHER QUIZLET SETS
Org Exam 2 TopHat
Information Systems--Clicker Questions
MGMT 3013 - Exam 3