38 terms

Individuals

The objects described by a set of data. Individuals may be people, animals, or things

Variable

Any characteristic of an individual. A variable can take different values for different individuals.

Categorical Variable

Places an individual into one of several groups or categories.

Quantitative Variable

Takes numerical values for which it makes sense to find an average.

Distribution

The distribution of a variable tells us what values the variable takes and how often it takes these values

Inference

Drawing conclusions that go beyond the data at hand.

Frequency table

Displays the count (frequency) of observations in each category or class.

Relative frequency table

Shows the percents (relative frequencies) of observations in each category or class.

Pie chart

Shows the distribution of a categorical variable as a "pie" whose slices are sized by the counts or percents for the categories. A pie chart must include all the categories that make up a whole.

Bar graph

Used to display the distribution of a categorical variable or to compare the sizes of different quantities. The horizontal axis of a bar graph identifies the categories or quantities being compared. Drawn with blank spaces between the bars to separate the items being compared.

Two-way table

A two-way table of counts organizes data about two categorical variables.

Marginal distribution

The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.

Conditional distribution

Describes the values of one variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.

Association

Occurs between two variables if specific values of one variable tend to occur in common with specific values of the other.

Dotplot

In a dotplot, each data value is shown as a dot above its location on a number line.

Shape

When using shape, concentrate on main features.

Ex: left-skewed, symmetric, right-skewed

Ex: left-skewed, symmetric, right-skewed

Center

We can describe the center by finding a value that divides the observations so that about half take the larger values and about half take the smaller values.

Spread

The spread of a distribution tells us how much variability there is in the data.

Outliers

A value that differs from the overall pattern.

Symmetric

A distribution is roughly symmetric if the right and left sides of the graph are approximately mirror images of each other.

Skewed to the right

When the right side of the graph is much longer than the left side

Skewed to the left

When the left side of the graph is much longer than the right side

Unimodal

Having a single peak

Bimodal

Having 2 peaks

Stemplot

A simple graphical display for fairly small data sets, also containing a stem and a leaf

Histogram

A graphical representation that organizes a group of data points into user-specified ranges

Mean

X bar - to find the mean of a set of observations, add their values and divide by the number of observations.

The mean is not resistant to outliers.

The mean is not resistant to outliers.

Median

The median, M, is the midpoint of a distribution, the number such that half the observations are smaller, and the other half are larger.

The Median is resistant to outliers.

The Median is resistant to outliers.

Range

Difference between the largest and smallest observations.

The range shows the full spread of the data.

The range shows the full spread of the data.

Quartiles

Q1 - lies 1/4 of the way up the list

Q3 - lies 3/4 of the way up the list

The quartiles make up the middle 50% of the data.

Q3 - lies 3/4 of the way up the list

The quartiles make up the middle 50% of the data.

Interquartile range (IQR)

Measures the range of the middle 50% of the data.

The IQR is resistant to outliers.

The IQR is resistant to outliers.

Five-Number Summary

Consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation written in order from smallest to largest

Boxplot

A graph using the 5 number summary of a distribution

Standard Deviation

Measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root.

Standard Deviation measures the spread and is NOT resistant to outliers.

Standard Deviation measures the spread and is NOT resistant to outliers.

Variance

Variance is the average squared distance.

(standard deviation squared)

(standard deviation squared)

Resistant

Resistant means not affected by outliers.

1.5 x IQR Rule

Call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile.

Helpful Tips...

Always plot your data.

Begin with graphs.

Add numeric summaries (5 number).

Don't forget your SOCS (Shape, Outliers, Center, Spread)

Begin with graphs.

Add numeric summaries (5 number).

Don't forget your SOCS (Shape, Outliers, Center, Spread)