33 terms

pinkgurl10PLUS

68-95-99.7 rule

In the Normal distribution with mean μ and standard deviation σ: Approximately 68% of the observations fall within σ of the mean μ. Approximately 95% of the observations fall within 2σ of μ. Approximately 99.7% of the observations fall within 3σ of μ.

cases

The objects described by a set of data. They may be customers, companies, subjects in a study, or other objects.

density curve

A curve that describes the overall pattern of a distribution. The area under the curve and above any range of values is the proportion of all observations that fall in that range. It is always on or above the horizontal axis and has area exactly 1 underneath it.

boxplot

A graph of the five-number summary with the following properties: 1. A central box spans the quartiles Q1 and Q3. 2. A line in the box marks the median M. 3. Lines extend from the box out to the smallest and largest observations or to a cutoff for suspected outliers.

five-number summary

Of a set of observations, this consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols, it is: Minimum Q1 M Q3 Maximum

interquartile range

Abbreviated IQR, this is the distance between the first and third quartiles. IQR = Q3 — Q1

linear transformation

A conversion of a numerical description of a distribution from one unit of measurement to another. Changes the orginal variable x into the new variable xnew given by an equation of the form xnew = a + bx

mean

Denoted "x-bar", this is the average value of a distribution. It can be found by adding the values of a set of observations and then dividing by the number of observations.

median

Denoted M, this is the midpoint of a distribution. Half of the observations are smaller than M, and the other half are larger than M.

Normal curves

A class of density curves that are symmetric, unimodal, and bell-shaped. They describe Normal distributions. The mean is at the center of the symmetric curve and is the same as the median.

bar graph

Graph that shows the distribution of a categorical variable by representing each category as a bar. The bar heights show the category counts or percents.

distribution

Tells us what values a variable takes and how often it takes these values.

quantitative variable

Variable that takes numerical values for which arithmetic operations such as adding and averaging make sense.

exploratory data analysis

Uses graphs and numerical summaries to describe the variables in a data set and the relations among them.

categorical variable

Variable that places a case into one of several groups or categories.

label

A special variable used in some data sets to distinguish the different cases.

variable

A characteristic of a case.

modes

Major peaks of a distribution. A distribution with one major peak is called unimodal.

Normal quantile plot

Plot that best assesses the adequacy of a Normal model for describing a distribution of data. A pattern on such a plot that deviates substantially from a straight line indicates that the data are not Normal.

skewed

Description of a distribution if the right tail (larger values) is much longer than the left tail (smaller values), or vice versa.

outlier

An individual value that falls outside the overall pattern of the distribution.

histogram

A graphical display of a quantitative variable that breaks the range of values of a variable into classes and displays only the count or percent of the observations that fall into each class. Chosen classes should always be of equal width.

pie chart

Graph that shows the distribution of a categorical variable as a "pie" whose slices are sized by the counts or percents for the categories.

stem plot

A display of the distribution of a quantitative variable that separates each observation into a stem and a one-digit leaf.

symmetric

Description of a distribution if the values smaller and larger than its midpoint are mirror images of each other.

time plot

Display of the distribution of a variable that plots each observation against the time at which it was measured. Always put time on the horizontal scale of the plot and the variable you are measuring on the vertical scale.

cumulative proportion

The proportion of observations in a distribution that lie at or below a given value. When the distribution is given by a density curve, this is the area under the curve to the left of a given value.

Normal distribution

Described by bell-shaped, symmetric, unimodal density curves. Completely specified by the mean μ and standard deviation σ. The mean is the center of symmetry, and σ is the distance from μ to the change-of-curvature points on either side.

standard Normal distribution

The Normal distribution N(0,1) with mean 0 and standard deviation 1.

z-score

A standardized value that tells us how many standard deviations the original observation falls away from the mean, and in what direction. Observations larger than the mean are positive when standardized, and observations smaller than the mean are negative.

quartiles

Used to describe the spread of a distribution. The first quartile, Q1, has one-fourth of the observations below it, and the third quartile, Q3, has three-fourths of the observations below it. Q1 is the median of the observations whose positions in the ordered list are to the left of the location of the overall median. Q3 is the median of the observations whose positions in the ordered list are to the right of the location of the overall median.

standard deviation

Denoted s, measures the spread of a distribution by looking at how far the observations are from their mean. It is also the square root of the variance.

variance

Denoted s2. In a set of observations, this is the average of the squares of the deviations of the observations from their mean.