Upgrade to remove ads
3.1 Visualizing Data
Terms in this set (67)
A data point or smaller pattern that does not follow the general pattern of a data set.
A visualization that compares a quantitative y-variable across different categories of an x-variable. Side-by-side and stacked bar graphs allow for additional comparisons of the y-values of categories-within-categories.
Data that are beyond the resources of one computer to store, especially intensive to analyze, or difficult to reconcile among complementary data sets.
A visualization showing how a set of quantitative values are distributed across the range of values. Shows less detail than a histogram, but provides more explicit information about quartiles and outliers.
Has values that have no natural order.
Center of a Distribution
Described by the mean, median, or mode, it is in some way the middle of the distribution.
The percentage of viewers that click on an ad.
Light-sensitive neurons in the retina responsible for color vision.
Data Mining, KDD
Knowledge Discovery in Data (KDD) uses computational power to find patterns in large data sets.
When multiple processors use the same instructions and apply them to subsets of the data.
Professional skill set that combines skills in transforming data, exploring data with visualization, using statistics, and communicating with graphic arts.
Data that has been scrubbed of name, address, and other information that makes it personally identifiable data.
Data calculated from other data. Raw data are measured or collected.
A data structure in which unordered keys each point to a value.
When storage or processing is handled by multiple independent machines in a coordinated fashion.
Electronic Frontier Foundation
A non-profit that advocates for privacy on the Internet.
End User License Agreement is the contract between a user and a company selling software for installation on the user's computer.
Exploratory Data Analysis
Visualizing data seeking patterns, contrasted with Statistical Analysis in which mathematics is used to determine the likelihood a pattern exists by chance.
A system designed to work when components fail.
A rectangular portion of the screen in a web browser.
The number of times something has occurred.
Abstracting knowledge or a solution to apply to a wider range of questions or problems.
Graphics Processing Unit (GPU)
A set of processors on a video card in a computer performing data parallel calculations to render objects (windows, etc.) on screen.
Placing data or constant values directly in programming code.
A visualization showing how a set of quantitative values are distributed across the range of values.
A single viewing of an ad.
Calculates probabilities when trying to generalize observations from sample data to apply to the population that was sampled.
An operation on data works in-place if it is able to perform the operation without setting aside memory to store a new copy of the data.
The 75%ile (percentile) measurement minus the 25%ile measurement.
Intervals, Classes, or Bins
The individual categories in a histogram.
The "average" obtained by dividing the sum of data by the number of data elements.
The middle value in a set of measurements placed in order.
The most common value in a set of measurements.
Monte Carlo Simulation
Using random numbers to simulate phenomena that has variation.
Cells that send electrical signals as output based on electrical signal or sensation as input.
The classic "bell curve" that is commonly observed because of the Central Limit Theorem: blending multiple effects from any distributions creates a normal distribution.
The portion of cortex at the back of the brain; processes vision.
Opt-In, Opt-Out Clauses
Clauses that let a user customize an agreement or interface. "Opt in" clauses do not apply by default. "Opt out" clauses apply by default.
Using two or more CPUs simultaneously.
A data visualization in which a circle is broken into parts. The parts add up to 100% of some quantity.
The entire set of measurements that can be made or the theoretical infinite set of measurements that are being sampled.
Tells details about a company offering a service on the web, details about what data the company can collect from a user, whether the company can sell that data, and so on.
An interface where a user is given partial control of what data about themselves is collected and who can access it.
Has values that have a meaningful order.
Maximum value minus minimum value.
When two or more machines fulfill the same purpose (storage, processing, serving a protocol, etc.).
Reattaching personal identities to de-identified data, often because Big Data make anonymity unlikely.
How often something occurs as a percentage of the time.
Used in a formula in a spreadsheet cell, a relative reference is a direction from the formula's spreadsheet cell to the spreadsheet cell with the data.
Light sensitive tissue at the back of the eyeball.
Light-sensitive neurons in the retina responsible for black and white vision under low lighting conditions.
The set of values in a sample of measurements, as opposed to the population distribution.
A visualization in which each point plotted shows at least 2 variables: the x- and y-coordinates.
Data that is considered private, such as financial, educational, and health records.
Shape of Distribution
Symmetric, positively-skewed (with a heavier right tail), or negatively-skewed (with a heavier left tail).
Spread of a Distribution
Described by Range, Interquartile Range, or Standard Deviation, the spread says how "wide" the distribution is.
The root-mean-square (RMS) deviation from the mean. Used to describe the width, especially of the normal distribution.
Advertising delivered to a computer user based on advertiser's knowledge about the user.
When multiple processors have different instructions, contributing to a job by each completing separate threads.
Terms of Service
The legal contract between a user and a company offering a service on the Web.
The Standard Normal Distribution
A normal distribution with mean 0 and standard deviation 1.
Separate, independent tasks within a job.
Creating a new data set by applying a calculation or algorithm to another data set.
Like the probability of rolling a 1, 2, 3, 4, 5, or 6, the uniform distribution has the same probability for measuring each value.
The percentage of viewers that visit an advertiser's site, either by clicking on an ad at the time advertised, or by visiting the advertiser's site later.
A graphical representation of data.
YOU MIGHT ALSO LIKE...
CSE 3.1 Key Terms
Carnegie Algebra I Chapter 8: Analyzing…
C784 Module 4: Descriptive Statistics for a Single…
info 320 test 1
OTHER SETS BY THIS CREATOR
Unit 5 - Applications of Integration
Unit 4 - Integration
MTH107 Unit 1 - Sampling and Data
Unit 3 - Applications of Differentiation