3.1 Visualizing Data
Terms in this set (67)
Anomaly
A data point or smaller pattern that does not follow the general pattern of a data set.
Bar Graph
A visualization that compares a quantitative y-variable across different categories of an x-variable. Side-by-side and stacked bar graphs allow for additional comparisons of the y-values of categories-within-categories.
Big Data
Data that are beyond the resources of one computer to store, especially intensive to analyze, or difficult to reconcile among complementary data sets.
Box Plot
A visualization showing how a set of quantitative values are distributed across the range of values. Shows less detail than a histogram, but provides more explicit information about quartiles and outliers.
Categorical Variable
Has values that have no natural order.
Center of a Distribution
Described by the mean, median, or mode, it is in some way the middle of the distribution.
Click-Through Rate
The percentage of viewers that click on an ad.
Cones
Light-sensitive neurons in the retina responsible for color vision.
Data Mining, KDD
Knowledge Discovery in Data (KDD) uses computational power to find patterns in large data sets.
Data Parallel
When multiple processors use the same instructions and apply them to subsets of the data.
Data Skills
Professional skill set that combines skills in transforming data, exploring data with visualization, using statistics, and communicating with graphic arts.
De-identified Data
Data that has been scrubbed of name, address, and other information that makes it personally identifiable data.
Derived Data
Data calculated from other data. Raw data are measured or collected.
Device Fingerprinting
Use by an advertiser of details about a user's hardware and software (as reported through JavaScript) to identify the user when they return.
Dictionary
A data structure in which unordered keys each point to a value.
Distributed
When storage or processing is handled by multiple independent machines in a coordinated fashion.
Electronic Frontier Foundation
A non-profit that advocates for privacy on the Internet.
EULA
End User License Agreement is the contract between a user and a company selling software for installation on the user's computer.
Exploratory Data Analysis
Visualizing data seeking patterns, contrasted with Statistical Analysis in which mathematics is used to determine the likelihood a pattern exists by chance.
Fault-Tolerant
A system designed to work when components fail.
Frame
A rectangular portion of the screen in a web browser.
Frequency
The number of times something has occurred.
Generalization
Abstracting knowledge or a solution to apply to a wider range of questions or problems.
Graphics Processing Unit (GPU)
A set of processors on a video card in a computer performing data parallel calculations to render objects (windows, etc.) on screen.
Hard-Coding
Placing data or constant values directly in programming code.
Histogram
A visualization showing how a set of quantitative values are distributed across the range of values.
Impression
A single viewing of an ad.
Inferential Statistics
Calculates probabilities when trying to generalize observations from sample data to apply to the population that was sampled.
In-Place
An operation on data works in-place if it is able to perform the operation without setting aside memory to store a new copy of the data.
Interquartile Range
The 75%ile (percentile) measurement minus the 25%ile measurement.
Intervals, Classes, or Bins
The individual categories in a histogram.
Mean
The "average" obtained by dividing the sum of data by the number of data elements.
Median
The middle value in a set of measurements placed in order.
Mode
The most common value in a set of measurements.
Monte Carlo Simulation
Using random numbers to simulate phenomena that has variation.
Neurons
Cells that send electrical signals as output based on electrical signal or sensation as input.
Normal Distribution
The classic "bell curve" that is commonly observed because of the Central Limit Theorem: blending multiple effects from any distributions creates a normal distribution.
Occipital Lobe
The portion of cortex at the back of the brain; processes vision.
Opt-In, Opt-Out Clauses
Clauses that let a user customize an agreement or interface. "Opt in" clauses do not apply by default. "Opt out" clauses apply by default.
Parallel Processing
Using two or more CPUs simultaneously.
Pie Chart
A data visualization in which a circle is broken into parts. The parts add up to 100% of some quantity.
Population Distribution
The entire set of measurements that can be made or the theoretical infinite set of measurements that are being sampled.
Privacy Policy
Tells details about a company offering a service on the web, details about what data the company can collect from a user, whether the company can sell that data, and so on.
Privacy Settings
An interface where a user is given partial control of what data about themselves is collected and who can access it.
Quantitative Variable
Has values that have a meaningful order.
Range
Maximum value minus minimum value.
Redundant
When two or more machines fulfill the same purpose (storage, processing, serving a protocol, etc.).
Re-identification
Reattaching personal identities to de-identified data, often because Big Data make anonymity unlikely.
Relative Frequency
How often something occurs as a percentage of the time.
Relative Reference
Used in a formula in a spreadsheet cell, a relative reference is a direction from the formula's spreadsheet cell to the spreadsheet cell with the data.
Retina
Light sensitive tissue at the back of the eyeball.
Rods
Light-sensitive neurons in the retina responsible for black and white vision under low lighting conditions.
Sample Distribution
The set of values in a sample of measurements, as opposed to the population distribution.
Scatter Plot
A visualization in which each point plotted shows at least 2 variables: the x- and y-coordinates.
Sensitive Information
Data that is considered private, such as financial, educational, and health records.
Shape of Distribution
Symmetric, positively-skewed (with a heavier right tail), or negatively-skewed (with a heavier left tail).
Spread of a Distribution
Described by Range, Interquartile Range, or Standard Deviation, the spread says how "wide" the distribution is.
Standard Deviation
The root-mean-square (RMS) deviation from the mean. Used to describe the width, especially of the normal distribution.
Targeted Advertising
Advertising delivered to a computer user based on advertiser's knowledge about the user.
Task Parallel
When multiple processors have different instructions, contributing to a job by each completing separate threads.
Terms of Service
The legal contract between a user and a company offering a service on the Web.
The Standard Normal Distribution
A normal distribution with mean 0 and standard deviation 1.
Threads
Separate, independent tasks within a job.
Transformation
Creating a new data set by applying a calculation or algorithm to another data set.
Uniform Distribution
Like the probability of rolling a 1, 2, 3, 4, 5, or 6, the uniform distribution has the same probability for measuring each value.
View-Through Rate
The percentage of viewers that visit an advertiser's site, either by clicking on an ad at the time advertised, or by visiting the advertiser's site later.
Visualization
A graphical representation of data.
