38 terms

Chapter 1: Exploring Data

STUDY
PLAY

Terms in this set (...)

Individuals
The objects described by a set of data. Individuals may be people, animals, or things
Variable
Any characteristic of an individual. A variable can take different values for different individuals.
Categorical Variable
Places an individual into one of several groups or categories.
Quantitative Variable
Takes numerical values for which it makes sense to find an average.
Distribution
The distribution of a variable tells us what values the variable takes and how often it takes these values
Inference
Drawing conclusions that go beyond the data at hand.
Frequency table
Displays the count (frequency) of observations in each category or class.
Relative frequency table
Shows the percents (relative frequencies) of observations in each category or class.
Pie chart
Shows the distribution of a categorical variable as a "pie" whose slices are sized by the counts or percents for the categories. A pie chart must include all the categories that make up a whole.
Bar graph
Used to display the distribution of a categorical variable or to compare the sizes of different quantities. The horizontal axis of a bar graph identifies the categories or quantities being compared. Drawn with blank spaces between the bars to separate the items being compared.
Two-way table
A two-way table of counts organizes data about two categorical variables.
Marginal distribution
The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.
Conditional distribution
Describes the values of one variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.
Association
Occurs between two variables if specific values of one variable tend to occur in common with specific values of the other.
Dotplot
In a dotplot, each data value is shown as a dot above its location on a number line.
Shape
When using shape, concentrate on main features.
Ex: left-skewed, symmetric, right-skewed
Center
We can describe the center by finding a value that divides the observations so that about half take the larger values and about half take the smaller values.
Spread
The spread of a distribution tells us how much variability there is in the data.
Outliers
A value that differs from the overall pattern.
Symmetric
A distribution is roughly symmetric if the right and left sides of the graph are approximately mirror images of each other.
Skewed to the right
When the right side of the graph is much longer than the left side
Skewed to the left
When the left side of the graph is much longer than the right side
Unimodal
Having a single peak
Bimodal
Having 2 peaks
Stemplot
A simple graphical display for fairly small data sets, also containing a stem and a leaf
Histogram
A graphical representation that organizes a group of data points into user-specified ranges
Mean
X bar - to find the mean of a set of observations, add their values and divide by the number of observations.
The mean is not resistant to outliers.
Median
The median, M, is the midpoint of a distribution, the number such that half the observations are smaller, and the other half are larger.
The Median is resistant to outliers.
Range
Difference between the largest and smallest observations.
The range shows the full spread of the data.
Quartiles
Q1 - lies 1/4 of the way up the list
Q3 - lies 3/4 of the way up the list
The quartiles make up the middle 50% of the data.
Interquartile range (IQR)
Measures the range of the middle 50% of the data.
The IQR is resistant to outliers.
Five-Number Summary
Consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation written in order from smallest to largest
Boxplot
A graph using the 5 number summary of a distribution
Standard Deviation
Measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root.
Standard Deviation measures the spread and is NOT resistant to outliers.
Variance
Variance is the average squared distance.
(standard deviation squared)
Resistant
Resistant means not affected by outliers.
1.5 x IQR Rule
Call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile.
Helpful Tips...
Always plot your data.
Begin with graphs.
Add numeric summaries (5 number).
Don't forget your SOCS (Shape, Outliers, Center, Spread)