Statistical Learning - Model Selection and Non parametric Models
Terms in this set (9)
Cross-validation
A method for evaluating a statistical model or algorithm that has free parameters. Divide the training data into several parts, and in turn use one part to test the procedure fitted to the remaining parts. It can be used for model selection or for parameter estimation when there are many parameters. Jackknifing is a similar, but slightly different, technique. See
Bayesian model selection
Selecting the model which assigns the highest probability to the data after all parameters have been integrated out. See my research demo page. Also see "Bayesian Interpolation", "Automatic choice of dimensionality for PCA", "Hyperparameter Selection for Self-Organizing Maps", and Heckerman's papers.
Minimum Message Length model selection
The Minimum Message Length heuristic for parameter estimation can be adapted for model selection as well. See "MML Linear Regression" and "Finding Overlapping Components with MML".
Structural Risk Minimization
Selecting the model whose expected future risk is minimal, assuming the parameters of the model will be chosen according to Empirical Risk Minimization. See Vapnik and "An Experimental and Theoretical Comparison of Model Selection Methods"
Nearest-neighbor density estimation
A technique for nonparametric density estimation. The density at any point is inversely proportional to the distance to the kth nearest datum. For the conditional density of y given x, take the k nearest data points to x and use them to recursively compute the density of y (using any density estimator). See Duda&Hart.
Nearest-neighbor classification
Nearest-neighbor density estimation applied to predicting the class of an object. To classify measurement x, take the k nearest training measurements and choose the most popular class among them. The quality of this method depends crucially on the distance metric. This method is very sensitive to irrelevant features, so it is usually combined with feature selection. See Dasarathy, "Towards a Better Understanding of Memory-Based Reasoning Systems", and "Flexible metric nearest-neighbor classification".
Nearest-neighbor regression
A technique for nonparametric regression. Take the k nearest data points in x and use them to recursively perform a regression on y (using any regression algorithm). See the local learning Web site.
Kernel density estimation
A technique for nonparametric density estimation. The density is given by centering a kernel function, e.g. a Gaussian bell curve, on each data point and then adding the functions together. The quality of the estimate depends crucially on the kernel function. Also known as Parzen-window density estimation. For the conditional density of y given x, weight the data by distance to x and use the weights to recursively compute the density of y (using any density estimator). If y is a class variable, then the result is a kernel classifier. See Duda&Hart.
Locally weighted regression
A technique for nonparametric regression, similar to kernel density estimation. To predict y from x, weight the data by distance to x and compute a weighted linear regression. The resulting curve can be nonlinear. This is also called Moving Least Squares. Kernel regression is similar but uses only a weighted average, not a linear regression, and tends to chop off extrema of the function. See the local learning Web site.
