Machine Learning Interview Deck
Terms in this set (35)
Used to define the performance of a classification model, by showing the number of true positives, false positives, true negatives, and false negatives
is a summary of prediction results on a classification problem
The confusion matrix shows the ways in which your classification model is confused when it makes predictions.It gives us insight not only into the errors being made by a classifier but more importantly the types of errors that are being made.
ROC curve (receiver operating characteristic)
- visual representation of a tests's performance as we vary the cut-points for a positive test
- vertical axis: sensitivity (true positive fraction)
- horizontal axis: 1-specificity (false positive fraction)
- curve that comes closes to the top left hand corner is the best
- can be used to compare two tests to each other
- "area under curve" closer to 1 means perfect test (0.5 means worthless)
Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation
Shuffle the dataset randomly.
Split the dataset into k groups
For each unique group:Take the group as a hold out or test data set
Take the remaining groups as a training data set
Fit a model on the training set and evaluate it on the test set
Retain the evaluation score and discard the model
Summarize the skill of the model using the mean of model evaluation scores
primarily used in applied machine learning to estimate the skill of a machine learning model on unseen data
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.
Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn't seen before. As a result, such models perform very well on training data but has high error rates on test data.
If our model is too simple and has very few parameters then it may have high bias and low variance. On the other hand if our model has large number of parameters then it's going to have high variance and low bias. So we need to find the right/good balance without overfitting and underfitting the data.
A regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces.
type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models
a way to create a parsimonious model when the number of predictor variables in a set exceeds the number of observations, or when a data set has multicollinearity (correlations between predictor variables).
It works in part because it doesn't require unbiased estimators; While least squares produces unbiased estimates, variances can be so large that they may be wholly inaccurate. Ridge regression adds just enough bias to make the estimates reasonably reliable approximations to true population values.
In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.
A data set in which the input and the desired output are both provided to the computer.
In practice, the training dataset often consist of pairs of an input vector (or scalar) and the corresponding output vector (or scalar), which is commonly denoted as the target (or label). The current model is run with the training dataset and produces a result, which is then compared with the target, for each input vector in the training dataset. Based on the result of the comparison and the specific learning algorithm being used, the parameters of the model are adjusted.
A data set in which the input is provided to the computer and the desired output is known, so that it can be determined how well a machine learning an algorithm is working. A validation dataset is a dataset of examples used to tune the hyperparameters (i.e. the architecture) of a classifier. It is sometimes also called the development set or the "dev set".
The Akaike information criterion (AIC) is an estimator of the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.
works to balance the complexity of a model versus how well it fits the data
In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion (also SIC, SBC, SBIC) is a criterion for model selection among a finite set of models; the model with the lowest BIC is preferred.
klog(n) - 2Log(L(theta)) where L(theta) is the probability of obtaining the data if the model is a given
it is the maximized value of the likelihood function of the model where theta are the parameter values that maximize the likelihood function;
k=number of parameters your model estimates
theta=set of all parameters
A loss function is a measure of how good a prediction model does in terms of being able to predict the expected outcome. A most commonly used method of finding the minimum point of function is "gradient descent".
Support Vector Machine
Supervised learning classification tool that seeks a dividing hyperplane for any number of dimensions can be used for regression or classification
Maximizes the minimum distance to the decision boundary line defined by the support vectors
The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. For all and in the input space , certain functions can be expressed as an inner product in another space . The function is often referred to as a kernel or a kernel function.
A supervised algorithm used for regression or classification that uses a collection of tree data structures trees "vote" on the best prediction
There is a direct relationship between the number of trees in the forest and the results it can get: the larger the number of trees, the more accurate the result.
if there are enough trees, it won't overfit that much
The idea of boosting methods is to combine several weak learners to form a stronger one.
an ensemble method for improving the model predictions of any given learning algorithm. The idea of boosting is to train weak learners sequentially, each trying to correct its predecessor.
an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters of our model by minimizing our loss functions
supervised ML algorithm that can be used for both classification and regression
assumes that similar data points exist in close proximity
assumes a data point belongs to the most frequent class of the data point's "k-nearest neighbors"
1.Load the data
2.Initialise the value of k
3.For getting the predicted class, iterate from 1 to total number of training data points
3a.Calculate the euclidean distance between test data and each row of training data.
3b.Sort the calculated distances in ascending order based on distance values
3c.Get top k rows from the sorted array
3d.Get the most frequent class of these rows
3e.Return the predicted class
an unsupervised ML algorithm that aims to group similar data points together that are far from other points, and discover underlying patterns
Informally, goal is to find groups of points that are close to each other but far from points in other groups
• Each cluster is defined entirely and only by its centre, or mean value µk
start by defining the number of centroids, which is a real or imaginary location representing the center of a cluster
each data point is allocated to a cluster by reducing the in-cluster sum of squares
the K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible.
Principal Component Analysis
method for reducing the dimension of a feature space
this technique makes it so we are considering fewer features when designing our model and we are less likely to overfit
method for feature extraction, so that we can combine our input variables in such a way that we can drop the less important variables
PCA results in having all independent variables which is useful for linear models
A function (for example, ReLU or sigmoid) that takes in the weighted sum of all of the inputs from the previous layer and then generates and passes an output value (typically nonlinear) to the next layer.
Given an artificial neural network and an error function, the method calculates the gradient of the error function with respect to the neural network's weights.
The "backwards" part of the name stems from the fact that calculation of the gradient proceeds backwards through the network, with the gradient of the final layer of weights being calculated first and the gradient of the first layer of weights being calculated last. Partial computations of the gradient from one layer are reused in the computation of the gradient for the previous layer. This backwards flow of the error information allows for efficient computation of the gradient at each layer versus the naive approach of calculating the gradient of each layer separately.
The process of a method calling itself in order to solve a problem.
Recursion is a method of solving problems that involves breaking a problem down into smaller and smaller subproblems until you get to a small enough problem that it can be solved trivially. Usually recursion involves a function calling itself. While it may not seem like much on the surface, recursion allows us to write elegant solutions to problems that may otherwise be very difficult to program.
the practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer.
Cloud computing makes computer system resources, especially storage and computing power, available on demand without direct active management by the user. The term is generally used to describe data centers available to many users over the Internet. Large clouds, predominant today, often have functions distributed over multiple locations from central servers.
What vendor flavor of SQL are you familiar with? How may years have you worked with it?
What's a primary key?
What's a foreign key?
key used to link 2 tables
an attribute in one table that is linked to the primary key of another table
foreign key constraint is used to prevent actions that would destroy links between tables
What's an inner join?
How do you identify duplicate records in a table?
SELECT attributes FROM table GROUP BY attributes HAVING COUNT(*)>1
What's the difference between DML and DDL. Give examples.
consists of the SQL commands that can be used to define the database schema. It simply deals with descriptions of the database schema and is used to create and modify the structure of database objects in the database.
CREATE TABLE, ALTER TABLE, DROP
The SQL commands that deals with the manipulation of data present in the database belong to DML or Data Manipulation Language and this includes most of the SQL statements.
INSERT INTO table
What's an array?
collection of items stored at contiguous memory locations. The idea is to store multiple items of same type together.
allow random access of elements.
What's a dataframe?
What's a NAN value?
NaN (Not a Number) does not equal anything, not even another NaN.
represents basically an invalid number
OTHER SETS BY THIS CREATOR
Unit 5 Terms
Unit 4 Quizzes
Pandas, Numpy stuff