41 terms

Individuals

Individuals are objects described by a set of data. Individuals may be people, but they may also be animals or things.

Variable

A variable is any characteristic of an individual. A variable can take different values for different individuals.

Categorical Variable

A categorical variable places an individual into one of several groups or categories.

Quantitative Variable

A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense.

Distribution

The distribution of a variable tells us what values the variable takes and how often it takes these variables.

Bar Graph

A graph using bars to represent the values of a data set

Pie Chart

Chart that shows the relationship of a part to a whole

Dotplot

used for quantitative data; uses number line; doesn't have to start at 0, but no numbers can be skipped once counting starts; similar to histogram except with DOTS.

Stemplot

used for small sets of quantitative data; numbers arranged smallest on top, largest on bottom; don't skip #s; cut-off last digit of every #; eliminate duplicates; makes a histogram.

Histogram

A graph that uses bars for groups of data, almost like a bar graph with the data things attached together and is used for large sets of data.

Symmetric

A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other.

Skewed right

A distribution is skewed to the right if the right side of the histogram (containing half of the observations with larger values) extend much farther out than the left side.

Skewed left

It is skewed to the left if the left side of the histogram extends much farther out than the right side

Outlier

An outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph.

Percentile

The pth percentile of a distribution is the value such that p percent of the observation fall at or below it.

Relative Cumulative Frequency Graph

takes the cumulative frequency and divides it by the final sample number, putting it into a percentage, usually in decimal form.

Ogive

cumulative distribution plotted against class boundaries rather than class marks

- usually "less than" distribution

- usually "less than" distribution

Time Plot

The pth percentile of a distribution is the value such that p percent of the observation fall at or below it.

Trend

A long-term upward or downward movement over time.

Seasonal Variation

A pattern that repeats itself at regular time intervals.

Exploratory Data Analysis

An examination of data using statistical tools and ideas in order to describe their main features.

Distribution

The distribution of a variable tells us what values the variable takes and how often it takes these variables.

Deviations

The variance s2 of a set of observations is the average of the squares of the deviations of the observations from their mean.

Mean*

To find the mean of a set of observations, add their values and divide by the number of observations.

Resistant Measure

A value that cannot resist the influence of extreme observations

Median

The middle number in a set of numbers that are listed in order

Range

Measurement of spread, the difference between the largest and smallest observations

First Quartile

The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median.

Third Quartile

The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median.

Interquartile Range (IQR)

The interquartile range (IQR) is the distance between the first and third quartiles,

IQR= Q3 - Q1

IQR= Q3 - Q1

Five-Number Summary

The five-number summary of a data set consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest.

Modified Boxplot

A modified boxplot is a graph of the five-number summary, with outliers plotted individually.

Variance*

Average of the squares of the deviations of the observations from their mean

Standard Deviation*

The average distance from the observations to the mean

Degrees of Freedom

Number of scores that can vary in the calculation of a statistic.

Center

a value that attempts the impossible by summarizing the entire distribution with a single number, a "typical" value.

Spread

a numerical summary of how tightly the values are clustered around the "center".

Quartiles

the lower of this is the value with a quarter of the data below it; the upper of this has a quarter of the data above it

Linear Transformations

A linear transformation on data is of the form x-new = a + bx, where a and b are numbers.

Back-to-Back Stemplots

use this when you want to compare two related distributions.

Side-by-Side Boxplots

can be used to compare the distributions of two data sets.