Home
Subjects
Create
Search
Log in
Sign up
Upgrade to remove ads
Only $2.99/month
Science
Computer Science
Artificial Intelligence
Statistical Learning - Fundamental concepts
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (11)
Statistical Learning/Pattern Recognition
An approach to machine intelligence which is based on statistical modeling of data. With a statistical model in hand, one applies probability theory and decision theory to get an algorithm. This is opposed to using training data merely to select among different algorithms or using heuristics/"common sense" to design an algorithm.
Features
The measurements which represent the data. The statistical model one uses is crucially dependent on the choice of features. Hence it is useful to consider alternative representations of the same measurements (i.e. different features). For example, different representations of the color values in an image. General techniques for finding new representations include discriminant analysis, principal component analysis, and clustering.
Classification
Assigning a class to a measurement, or equivalently, identifying the probabilistic source of a measurement. The only statistical model that is needed is the conditional model of the class variable given the measurement. This conditional model can be obtained from a joint model or it can be learned directly. The former approach is generative since it models the measurements in each class. It is more work, but it can exploit more prior knowledge, needs less data, is more modular, and can handle missing or corrupted data. Methods include mixture models and Hidden Markov Models. The latter approach is discriminative since it focuses only on discriminating one class from another. It can be more efficient once trained and requires fewer modeling assumptions. Methods include logistic regression, generalized linear classifiers, and nearest-neighbor. See "Discriminative vs Informative Learning".
Regression
Predicting the value of random variable y from measurement x. For example, predicting engine efficiency based on oil pressure. Regression generalizes classification since y can be any quantity, including a class index. Many classification algorithms can be understood as thresholding the output of a regression. Like classification, one can obtain the conditional model of y from a joint model (which includes a model of x) or it can be learned directly. Curve fitting is the common special case where y is assumed to be a deterministic function of x, plus additive noise (usually Gaussian). Methods for curve fitting include radial basis functions, feed-forward neural networks, and mixtures of experts.
Nonparametric regression/density estimation
An approach to regression/density estimation that doesn't require much prior knowledge but only a large amount of data. For regression, it includes nearest-neighbor, weighted average, and locally weighted regression. For density estimation, it includes histograms, kernel smoothing, and nearest-neighbor.
Parameter Estimation
Density estimation when the density is assumed to be in a specific parametric family. Special cases include maximum likelihood, maximum a posteriori, unbiased estimation, and predictive estimation. See the section on Parameter estimation techniques.
Model selection
Choosing the parametric family to use for density estimation. This is harder than parameter estimation since you have to take into account every member of each family in order to choose the best family. Considering only the best member of each family is not sufficient (one would tend to choose the biggest family). See the section on Model selection techniques.
Independence Diagram
A graphical way of expressing the conditional independence relationships among a set of random variables. They cannot encode every possible form of conditional independence but they go a long way toward this end. They are also called "Bayesian networks." See "Independence Diagrams", A Brief Introduction to Graphical Models and Bayesian Networks, Course Notes on Bayesian Networks, and Pearl.
Active learning
Determining the optimal measurements to make under a cost constraint. A measurement is "optimal" when it is expected to give the most new information about the parameters of a model. Active learning is thus an application of decision theory to the process of learning. It is also known as experiment design. See "Employing EM in Pool-Based Active Learning for Text Classification", "Selective sampling using the Query by Committee algorithm", "Reinforcement Learning: A Survey", Box&Draper, and Raiffa&Schlaifer.
Reinforcement learning
Learning how to act optimally in a given environment, especially with delayed and nondeterministic rewards. It is equivalent to adaptive control. There are two interleaved tasks: modeling the environment and making optimal decisions based on the model. The first task is a statistical modeling problem and is handled using the techniques listed in this glossary. The second task is a decision theory problem: converting the expectation of delayed reward into an immediate action. Since reinforcement learning requires exploration, it is often combined with active learning, though this is not essential. Most learning problems that humans face are reinforcement learning problems, e.g. deciding which melon to buy, which coat to wear outside today, or which friends to have. See "Reinforcement Learning: A Survey" and "Reinforcement Learning: A Tutorial".
No free lunch
The point that all statistical models are necessarily biased in one way or another, and that no single bias is globally optimal. Mitchell, and later Wolpert, emphasized this point in order to stop useless comparisons between learning algorithms that were using different priors (like Euclidean nearest neighbor vs. axis-parallel decision trees). The real way to evaluate algorithms is how well they can utilize prior knowledge given to them, i.e. how well they can approximate Bayesian learning. See "The Need for Biases in Learning Generalizations" in Readings in Machine Learning, "The lack of a priori distinctions between learning algorithms", "Bayesian regression filters and the issue of priors" (the "issue of priors" part), and Cross-validation.
THIS SET IS OFTEN IN FOLDERS WITH...
Statistical Learning - Feature extraction techniqu…
9 terms
Statistical Learning - Statistical models
41 terms
Statistical Learning - Parameter estimation techni…
24 terms
Statistical Learning - Model Selection and Non par…
9 terms
YOU MIGHT ALSO LIKE...
Machine Learning Technologies
19 terms
Machine learning and Data Analytics questions
23 terms
Machine Learning 1-3.5/Quiz 1
47 terms
Experiential Learning Theory
30 terms
OTHER SETS BY THIS CREATOR
LeetcodeTian
191 terms
Deep Learning Glossary
56 terms
c++ glossary
558 terms
database terminologies
136 terms
OTHER QUIZLET SETS
common assesment social
25 terms
LL 5 Gr. Level 2
30 terms
Psyc 2430 Final
43 terms
Post Quiz- Ch. 4,5,6
15 terms