38 terms

Chapter 1: Exploring Data


Terms in this set (...)

The objects described by a set of data. Individuals may be people, animals, or things
Any characteristic of an individual. A variable can take different values for different individuals.
Categorical Variable
Places an individual into one of several groups or categories.
Quantitative Variable
Takes numerical values for which it makes sense to find an average.
The distribution of a variable tells us what values the variable takes and how often it takes these values
Drawing conclusions that go beyond the data at hand.
Frequency table
Displays the count (frequency) of observations in each category or class.
Relative frequency table
Shows the percents (relative frequencies) of observations in each category or class.
Pie chart
Shows the distribution of a categorical variable as a "pie" whose slices are sized by the counts or percents for the categories. A pie chart must include all the categories that make up a whole.
Bar graph
Used to display the distribution of a categorical variable or to compare the sizes of different quantities. The horizontal axis of a bar graph identifies the categories or quantities being compared. Drawn with blank spaces between the bars to separate the items being compared.
Two-way table
A two-way table of counts organizes data about two categorical variables.
Marginal distribution
The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.
Conditional distribution
Describes the values of one variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.
Occurs between two variables if specific values of one variable tend to occur in common with specific values of the other.
In a dotplot, each data value is shown as a dot above its location on a number line.
When using shape, concentrate on main features.
Ex: left-skewed, symmetric, right-skewed
We can describe the center by finding a value that divides the observations so that about half take the larger values and about half take the smaller values.
The spread of a distribution tells us how much variability there is in the data.
A value that differs from the overall pattern.
A distribution is roughly symmetric if the right and left sides of the graph are approximately mirror images of each other.
Skewed to the right
When the right side of the graph is much longer than the left side
Skewed to the left
When the left side of the graph is much longer than the right side
Having a single peak
Having 2 peaks
A simple graphical display for fairly small data sets, also containing a stem and a leaf
A graphical representation that organizes a group of data points into user-specified ranges
X bar - to find the mean of a set of observations, add their values and divide by the number of observations.
The mean is not resistant to outliers.
The median, M, is the midpoint of a distribution, the number such that half the observations are smaller, and the other half are larger.
The Median is resistant to outliers.
Difference between the largest and smallest observations.
The range shows the full spread of the data.
Q1 - lies 1/4 of the way up the list
Q3 - lies 3/4 of the way up the list
The quartiles make up the middle 50% of the data.
Interquartile range (IQR)
Measures the range of the middle 50% of the data.
The IQR is resistant to outliers.
Five-Number Summary
Consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation written in order from smallest to largest
A graph using the 5 number summary of a distribution
Standard Deviation
Measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root.
Standard Deviation measures the spread and is NOT resistant to outliers.
Variance is the average squared distance.
(standard deviation squared)
Resistant means not affected by outliers.
1.5 x IQR Rule
Call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile.
Helpful Tips...
Always plot your data.
Begin with graphs.
Add numeric summaries (5 number).
Don't forget your SOCS (Shape, Outliers, Center, Spread)