Search
Create
Log in
Sign up
Log in
Sign up
CIS 463 Final
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Gravity
Terms in this set (62)
Data
a collection of facts usually obtained as the result of experiences, observations, or experiments.
ANN (Artificial Neural Networks)
computer-based programs whose function is to model a problem space based on trial and error.
Input Layer (ANN)
layer that receives data.
Internal/Hidden Layer (ANN)
layer that processes data.
Output Layer (ANN)
Layer that relays final result.
Unsupervised learning
machine learning that draws inferences from data sets without labeled responses. No examples are provided
Supervised Learning
machine learning by inferring a function from labeled training data. A set of training examples is provided.
Backpropagation
involves changing input weights to neurodes based on errors in outputs, until those errors approach zero.
Simple Split
splitting data into test or training data to develop and score a model in order to create prediction accuracy.
Jackknifing
a estimation methodology where the estimation parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations.
Area under ROC curve
estimation methodology in order to determine which of the used models predicts the classes best; the true positive rates are plotted against false positive rates.
Leave-one-out
a estimation methodology using 1 observation as the validation set and the remaining observations as the training set.
Artificial Intelligence
The design and development of computer systems that exhibit intelligent behavior.
Turing Test
test that sees if a human can tell whether something is artificial intelligence or another human.
Knowledge Representation Systems
Capture existing expert knowledge and use it to
consult end-users and provide decision support.
Machine Learning
Algorithms that use mathematical or logical
techniques for finding patterns in data and discovering or creating new knowledge.
Data Mining
Using statistical, mathematical, AI, and
machine learning techniques to extract useful information and subsequent knowledge from large databases.
Deduction
If p, then q. We are given the rule, if the rule is correct, then the fact is correct, and then you will know the conclusion will be correct.
Abduction
If p, then q. Given the conclusion p, and from there concluding that the p must also be true. Referred to as "affirming the consequent"
Rule Based Systems (Expert Systems)
a set of "if-then" statements that uses a set of assertions, to which rules on how to act upon those assertions are created. Implements deductive and abductive logic.
Induction
If p, then q. We create the rule, and are not given it. Observations or premises are viewed as strong evidence for the truth of the conclusion. Highly error prone.
Decision Tree
structure that includes a root node, branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The topmost node in the tree is the root node.
Naive Bayes
classification technique where special classifiers can predict class membership probabilities such as the probability that a given tuple belongs to a particular class.
Clustering
a form of unsupervised learning, where there are no training examples provided. Goal is to create classification to cluster data into "clusters" or "classes"
Association Rule
function that discovers the probability of the co-occurrence of items in a collection.
Time Series
series of data points indexed (or listed or graphed) in time order; can be used for clustering, classification, query by content, anomaly detection as well as forecasting.
ID3 Decision Tree Algorithm
The goal of this algorithm is to find rules resulting in YES or NO values. Generates a tree, where each path of the tree represents a rule. The leaf node is the THEN part of the rule, and the nodes leading to it are the ANDs of attribute-value combinations in the IF part of the rule.
Entropy
Lack of predictability; mixture or chaos.
Apriori
Algorithm used in association rules that finds subsets that are common to at least a
minimum number of the item set.
Lift
In association rule mining, a score that indicates importance calculated as the probability of the item subset divided by compounded probability of the individual items in the item set.
K-Means Clustering Algorithm
There k random points randomly assigned as cluster centers, then points are assigned to their nearest cluster center, and this process is repeated until convergence criteria is met.
Co-reference resolution
finding all expressions that refer to the same entity in a text. (e.g. finding connections nouns and their associated pronouns)
Tokenizing
uses spaces as delimiters between words to split up the sentence into individual words.
Named Entity Recognition
models and classes for recognizing the names of people, places, organizations, time and date.
Constituency parsing
hierarchies of phrases, subphrases, etc.
Dependency parsing
Dependency relationships (binary predicates) between words in a sentence.
Nominal subject
nsubj
Nominal Passive Subject
nsubjpass
Direct object
dobj
Adjectival modifier
amod
conjoint and
conj_and
preposition
prep
Noun Compound Modifier
nn
Partial Verb modifier
partmod
MDX
language for querying OLAP cubes and customizing calculations in cube-building.
Query Axis
In the SELECT clause of an MDX query, where you select rows, columns, and in principle can have many axes.
Slicer Axes
In the WHERE clause of an MDX query, this is how you "slice" the cube.
Tuple
A collection of one or more members (from different Dimension hierarchies or Measure Groups)
Measure (or member)
An item in a dimension or measure group. Distinct value for a dimension's attribute or distinct measure. Lowest level of reference for a cube.
Set
Ordered collection of zero, one or more tuples
Data Visualization
Technologies that support visualization and interpretation of data and information...includes digital images, GIS, GUI, graphs, virtual reality, dimensional presentations, videos, and animation.
Bar Charts
To display changes over time, comparisons,
deviations, parts of the whole, rankings, time series.
Line Graphs
To display changes over time, comparisons,
deviations, parts of the whole, rankings, time series.
Pie Chart
To display part of the whole or proportions.
Radar Chart
Useful for showing multidimensional data in a 2-dimensional graph, each dimension is a radius emanating from the center of the
graph.
Concept Map
Shows relationships between concepts; Consists of nodes and links
TreeMap
Display hierarchical data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with
smaller rectangles representing sub-branches.
Word Clouds
Visual representation of word frequency in
a body of text.
Business Performance Management (BPM)
Using reporting tools to monitor, evaluate,
and improve performance.
Balanced Scorecard
BPM methodology that evaluates organizational performance based not only on financial measurements but non-financial organizational performance as well.
Six Sigma
a disciplined, data-driven approach and methodology for eliminating defects (driving toward six standard deviations between the mean and the nearest specification limit) in any process.
Dashboard
are interactive reporting tools,
using data visualizations.
;